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Preface 



The 9th Ibero-American conference on Artificial Intelligence IBERAMIA 2004 
took place in Mexico for the third time in 16 years, since the first conference 
organized in Barcelona in January 1988. It was also the second time that the 
conference was held in the state of Puebla. The first time, in 1996, it was the 
Universidad de la Americas Puebla that was in charge of the local organization of 
the conference, this year it was the turn of the Instituto National de Astroffsica, 
Optica y Electronica, INAOE, to do it. 

The 1996 conference was the last conference where all the papers were pre- 
sented in Spanish or Portuguese. Since then the proceedings have been published 
in English by Springer in the LNAI series. This linguistic change was a sign of 
the scientific maturity of the Ibero-American artificial intelligence community 
and the best way for it to share with the international artificial intelligence com- 
munity the best results of many of its research groups. It was also the way to 
open this forum to researchers of other countries to enrich the scientific content 
of the conferences. 

One relevant feature of the last four conferences with the proceedings pub- 
lished in English by Springer is that, besides the participation of people from 
many countries, the majority of papers came from Ibero-American researchers. 
We can state that IBERAMIA has consolidated itself as the main scientific fo- 
rum where the Ibero-American artificial intelligence researchers meet together 
every other year. In 2004 we received 304 papers, 97 of which were accepted; 
this comes up to an acceptance rate of 31%. The figures are similar to those of 
the 2002 Sevilla conference with 316 received papers and 97 accepted papers. 
The numbers of submitted and accepted papers per country are shown in the 
following table: 



Country 


Submitted 


Accepted 


Country 


Submitted 


Accepted 


Argentina 


3 


2 


India 


2 


0 


Austria 


4 


0 


Iran 


4 


1 


Belgium 


1 


1 


Israel 


1 


0 


Brazil 


54 


18 


Korea 


12 


2 


Canada 


4 


1 


Mexico 


103 


28 


Cuba 


6 


1 


Portugal 


17 


5 


Chile 


6 


4 


Spain 


56 


26 


China 


2 


1 


Tunisia 


4 


1 


France 


7 


1 


USA 


2 


2 


Germany 


2 


0 


Venezuela 


12 


3 


UK 


2 


0 
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The AI topics covered by the submitted and accepted papers can be seen in 
the following table: 



Topic 


Submitted 


Accepted 


Distributed artificial intelligence and multi-agent systems 


28 


7 


Knowledge engineering and case-based reasoning 


21 


4 


Planning and scheduling 


18 


8 


Machine learning and knowledge acquisition 


23 


6 


Natural language processing 


34 


8 


Knowledge representation and reasoning 


20 


10 


Knowledge discovery and data mining 


23 


4 


Robotics 


24 


8 


Computer vision 


32 


13 


Uncertainty and fuzzy systems 


11 


4 


Genetic algorithms and neural networks 


45 


15 


AI in education 


14 


4 


Miscellaneous topics 


11 


5 


Total 


304 


97 



IBERAMIA 2004 was organized as an initiative of the Executive Committee 
of IBERAMIA. This committee is in charge of the planning and supervision 
of IBERAMIA conferences. Its members are elected by the IBERAMIA board 
which itself is made up of representatives from the following Ibero-American 
associations: AEPIA (Spain) APPIA (Portugal), SBC (Brazil), SMIA (Mexico). 
This book contains revised versions of the 97 papers selected by the program 
committee for presentation and discussion during the conference. The volume 
is structured into 13 thematic groups according to the topics addressed by the 
papers. 



Acknowledgements 

We would like to express our sincere gratitude to all the people who helped to 
bring about IBERAMIA 2004. First of all thanks to the contributing authors, for 
ensuring the high scientific standard of the conference and for their cooperation 
in the preparation of this volume. 

Special thanks are due to the members of the program committee and auxil- 
iary reviewers for their professionalism and their dedication in selecting the best 
papers for the conference. Thanks also to the IBERAMIA Executive Committee 
for their guidance and their continuous support. 

We owe particular gratitude to the invited speakers for sharing with us their 
experiences and their most recent research results. 





Preface 



VII 



Nothing would have been possible without the initiative and dedication of 
the Organizing Committee, and the support of INAOE. We are very grateful to 
all the people who helped in the large variety of organizing tasks, namely Hector 
Lopez our web manager, Gabriela Lopez Lucio and Luis Villasenor Pineda our 
publicity managers, Josue Pedroza for his great job with the management of the 
CyberClrair system during the submission and evaluation processes, Jesus A. 
Gonzalez, Oscar E. Romero A. and Ivan Olmos for their help in the preparation 
of this book, Angelica Munoz, Guillermo de Ita, and Olac Fuentes, for their 
contribution to the management of the tutorials and workshops, Gorgonio Ceron 
Benitez and Carmen Meza Tlalpan for their help in the management of local 
arrangements and financial issues, Dulce Millan and Nidia Lara for their useful 
support in the administrative duties, and Lupita Rivera for the contacts with 
the media. All the members of the local committee headed by Carlos Alberto 
Reyes did a great job. 

Thanks to the invited speakers, the tutorial instructors and workshops chairs 
for giving more relevance to the Conference General Program. 

The French-Mexican Laboratory of Informatics, LAFMI, supported part of 
the travel expenses of the Program Chair between Xalapa and Tonantzintla. 

We would like to thank the Benemerita Universidad Autonoma de Puebla, for 
its support of the inaugural session and first invited speech held in the beautiful 
and historical conference hall Salon Barroco. We want to thank also the Univer- 
sidad de la Americas Puebla for their logistics support during the conference. 
We are also grateful to Microsoft Mexico, and especially to Luis Daniel Soto, for 
its financial support and for its contribution of an invited speaker and a tuto- 
rial. Our gratitude to Francisco Soto, Research and Graduate Studies Director 
of INAOE, and Aurelio Lopez, Head of the Computer Science Department of 
the INAOE, for their continuous support throughout this year. 

Tonantzintla, Puebla, Christian Lemaitre 

November 2004 Program/Clrair 

Carlos A. Reyes 
Organization/Chair 

Jesus A. Gonzalez 
Cyber/Chair 




IBERAMIA 2004 Organizing Committee 



Program and Scientific Chairman 

Christian Lemaitre 
Mexico 



Organization Chairman 

Carlos Alberto Reyes Garcia 
INAOE, Mexico 



Steering Committee 

Christian Lemaitre, LANIA, Mexico 
Alvaro cle Albornoz, SMIA, Mexico 
Arlindo Oliveira, APPIA, Portugal 
Federico Barber, AEPIA, Spain 
Francisco Garijo, Telefonica I+D, Spain 
Helder Cohelo, University of Lisbon, Portugal 
Jaime Sichman, SBC, Brazil 
Miguel Toro, University of Sevilla, Spain 



Program Committee 

Abraham Sanchez 
Agostino Poggi 
Alejandro Ceccatto 
Alexander Gelbuklr 
Alexis Drogoul 
Alberto Oliart Ros 
Amal El Fallah Seghrouchni 
Ana Garcia Serrano 
Ana Teresa Martins 
Analia Amandi 
Andre Ponce de Leon F. cle 
Andres Perez Uribe 
Angel P. del Pobil 
Anna Helena Reali Costa 
Antonio Bahamonde 
Antonio Ferrandez 



Antonio Moreno 
Ariadne Carvalho 
Arlindo Oliveira 
Arturo Hernandez Aguirre 
Beatriz Barros 
Bob Fisher 
Carlos A. Brizuela 
Carlos A. Coello Coello 
Carlos Alberto Reyes Garcia 
Carolina Chang 
Celso A. Kaestner 
Chilukuri K. Mohan 
Dibio Leandro Borges 
Duncan Gillies 
Ed Durfee 
Eduardo Morales 




X 



Organizing Committee 



Elisabeth Andre 
Enrique Sucar 
Ernesto Costa 
Eugene Santos 
Eugenio Oliveira 
Federico Barber 
Fernando Silva 
Francisco Cantu Ortiz 
Francisco J. Diez 
Franz Wotawa 
Gabriel Pereira Lopes 
Gabriela Henning 
Gabriela Ochoa Meier 
Geber Ramalho 
Gerhard Lakemeyer 
Gerson Zaveruclra 
Guilherme Bittencourt 
Guillermo Morales Luna 
Gustavo Arroyo Figueroa 
Humberto Sossa 
Jacek Malec 
Jean Pierre Briot 
Jim Little 

Joaquin Fdez-Valdivia 

Johan van Horebeek 

Jose A. Gamez Martin 

Jose Carlos Ferreira Maia Neves 

Jose Dorronsoro 

Jose Luis Gordillo 

Jose Riquelme Santocl 

Juan Flores 

Juan M. Corchado 

Juan Manuel Ahuactzin 

Juan Manuel Torres 

Juan Pavon 

Juergen Dix 

Katya Rodriguez Vazquez 
Kevin Knight 
Kwang Lee 

Leliane Nunes de Barros 
Leo Joskowicz 
Leonid Sheremetov 
Leopoldo Altamirano Robles 
Long Quan 

Luciano Garcia Garrido 



Luis Alberto Pineda 

Luis Correia 

Luis Marques Custodio 

Luis Villasenor 

Maarten van Someren 

Marcelo Finger 

Marcelo Ladeira 

Maria Carolina Monard 

Maria Cristina Riff 

Maria das Gragas Bruno Marietto 

Maria Fox 

Mario Koppen 

Matias Alvarado 

Mauricio Osorio Galindo 

Michael Gelfoncl 

Michael Hulrns 

Michael M. Luck 

Michel Devy 

Nicandro Cruz 

Olac Fuentes 

Pablo Noriega 

Paul Brna 

Paulo Cortez 

Paulo Quaresma 

Pavel Brazdil 

Pedro Larrahaga 

Pilar Gomez Gil 

Rafael Morales Gamboa 

Ramon Brena 

Raul Monroy 

Riichiro Mizoguchi 

Ronald C. Arkin 

Roque Marin 

Rosa Vicari 

Ruth Aylett 

Ryszard Klempous 

Salvador Abreu 

Simon Colton 

Stefano Cerri 

Thierry Fraichard 

Thomas G. Dietterich 

Toby Walsh 

William B. Langdon 

Yves Demazeau 




Organizing Committee 



XI 



Additional Reviewers 



Aida Vails 
Akiko Inaba 
Alejandro Zunino 
Alessandro Lameiras 
Alexandre da Silva 
Alexandra Suna 
Alicia Troncoso 
Aloisio Carlos de Pina 
Amanda Smith 
Amit Bhaya 
Ana Carolina Lorena 
Ana Paula Rocha 
Andreia Grisolio M. 
Andrew Coles 
Antonio Fernandez 
Antonio Garrido 
Antonio Lova 
Armando Matos 
Armando Suarez 
Arnaldo Mandel 
Aurelio Lopez Lopez 
Bernhard Peischl 
Brahim Hnich 
Carla Koike 
Carlos Brito 
Carlos Castillo 
Carlos Hitoshi Morimoto 
Cedric Pradalier 
Charles Callaway 
Daniel Koeb 
David Allen 
David Pearce 
Derek Long 
Edgardo Vellon 
Efren Mezura-Montes 
Elie Chadarevian 
Elizabeth Tapia 
Emmanuel Mazer 
Everardo Gutierrez 
Fabiola Lopes y Lopez 
Fabrfcio Enembreck 
Federico Ramirez Cruz 
Fernando Carvalho 
Fernando Godinez D. 



Fernando Llopis 
Fernando Lopez 
Francine Bicca 
Francisco Ferrer 
Francisco Rodriguez 
Fredric Marc 
Giordano Cabral 
Gustavo Batista 
Gustavo Olague 
Hae Yong Kim 
Heidi J. Romero 
Hiram Calvo-Castro 
Huei Diana 
Hugo Jair Escalante 
Hugo Santana 
Ignacio Mayorga 
Ivan Olmos Pineda 
Ivana Sumida 
Jacques Robin 
Jacques Wainer 
Javier Giacomantone 
Javier Martinez-Baena 
Jeronimo Pellegrini 
Jesus Peral 
Joaquim Costa 
Joerg Muller 
John Lee 
Jos Alferes 
Jose A. Garcia 
Jose A. Troyano 
Jose M. Puerta 
Jose Palma 
Juan Antonio Navarro 
Juan Carlos Lopez 
Julio Cesar Nievola 
Kamel Meklmacha 
Karina Valdivia Delgado 
Keith Halsey 
Ligia Ferreira 
Louise Seixas 
Luis Berdun 
Luis C. Gonzalez 
Luis Damas 
Luis Filipe Antunes 



Luis Miguel Rato 
Luis Paulo Reis 
Luis Sarmento 
M. Carmen Aranda 
Manuel Chi 
Manuel Mejia Lavalle 
Manuel Montes-y-Gomez 
Marcelino Pequeno 
Marcello Balduccini 
Marcelo Andrade T. 
Marcelo Armentano 
Marco Aurelio Pacheco 
Marcos Cunha 
Marc-Philippe Huget 
Mats Petter Pettersson 
Michel Ferreira 
Michele Tomaiuolo 
Miguel A. Salido 
Miguel Arias Estrada 
Mikal Ziane 
Nejla Amara 
Nelma Moreira 
Nick Campbell 
Nicolas Sabouret 
Nik Nailah Abdullah 
Oliver Obst 
Olivier Lebeltel P. 
Orlando Lee 
Pablo Granitto 
Pablo Verdes 
Paola Turd 
Patricia A. Jaques 
Peter Gregory 
Pilar Tormos 
Rafael Munoz 
Raul Giraldez 
Reinaldo A.C. Bianclri 
Renata Vieira 
Ricardo Azambuja S. 
Ricardo Bastos C. 
Ricardo Martins de A. 
Ricardo Silveira 
Riverson Rios 
Robinson Vida 




XII Organizing Committee 



Rolando Menchaca M. 
Ronaldo Cristiano P. 
Rosa Rodriguez S. 
Samir Aknine 
Sandra Alves 
Silvia Sclriaffino 
Silvio do Lago P. 
Solange Oliveira R. 



Steve Prestwiclr 
Tlramar Solorio Martinez 
Timothy Read 
Trilce Estrada Piedra 
Valdinei Freire 
Valguima Odakura 
Victoria Eyharabide 
Vilma Franga Fernandes 



Vitor Beires Nogueira 
Vladik Kreinoviclr 
Xavier Alaman 
Xavier Blanc 
Xose R. Fdez- Vidal 
Yichen Wei 
Yingqian Zhang 




Table of Contents 



IBERAMIA 2004 

Distributed AI and Multi-agent Systems 

Checking Social Properties of Multi-agent Systems with Activity Theory 



Ruben Fuentes, Jorge J. Gomez-Sanz, Juan Pavon 1 

MARCS Multi-agent Railway Control System 

Hugo Proenga, Eugenio Oliveira 12 

Dynamic Quality Control Based on Fuzzy Agents for Multipoint 
Videoconferencing 

Jesus Bobadilla, Luis Mengual 22 

A Component and Aspect-Based Architecture for Rapid Software 
Agent Development 

Mercedes Amor, Lidia Fuentes, Jose Maria Troya 32 

Formalization of Cooperation in MAS: Towards a Generic Conceptual 
Model 

Monia Loulou, Ahmed Hadj Kacem, Mohamed Jmaiel 43 

Web-Enabling MultiAgent Systems 

Eduardo H. Ramirez, Ramon F. Brena 53 

Gaining Competitive Advantage Through Learning Agent Models 

Leonardo Garrido, Ramon Brena, Katia Sycara 62 



Knowledge Engineering and Case Based Reasoning 



Towards an Efficient Rule-Based Coordination of Web Services 

Eloy J. Mata, Pedro Alvarez, Jose A. Banares, Julio Rubio 73 

Applying Rough Sets Reduction Techniques to the Construction of a 
Fuzzy Rule Base for Case Based Reasoning 

Florentino Fdez-Riverola, Fernando Diaz, Juan M. Corchado 83 

Dynamic Case Base Maintenance for a Case-Based Reasoning System 

Maria Salamo, Elisabet Golobardes 93 

A Case Base Seeding for Case-Based Planning Systems 

Flavio Tonidandel, Marcio Rillo 104 




XIV Table of Contents 



Planning and Scheduling 



Handling Numeric Criteria in Relaxed Planning Graphs 

Oscar Sapena, Eva Onaindia 114 

Constrainedness and Redundancy by Constraint Ordering 

Miguel A. Salido, Federico Barber 124 

To Block or Not to Block? 

Alejandro Gonzalez Romero, Rene Alquezar 134 

Adaptive Penalty Weights When Solving Congress Timetabling 

Daniel Angel Huerta- Amante, Hugo Terashima-Marin 144 

Decomposition Approaches for a Capacitated Hub Problem 

Inmaculada Rodriguez- Martin, Juan- Jose Salazar- Gonzalez 154 



An Efficient Method to Schedule New Trains on a Heavily Loaded 
Railway Network 

Laura Ingolotti, Federico Barber, Pilar Tormos, Antonio Lova, 



M. A. Salido, M. Abril 164 

Studs, Seeds and Immigrants in Evolutionary Algorithms for 
Unrestricted Parallel Machine Scheduling 

E. Ferretti, S. Esquivel, R. Gallard 174 

An Investigation on Genetic Algorithms for Generic STRIPS Planning 
Marcos Castilho, Luis Allan Kunzle, Edson Lecheta, Viviane 
Palodeto, Fabiano Silva 185 



Machine Learning and Knowledge Acquisition 



Improving Numerical Reasoning Capabilities of Inductive Logic 
Programming Systems 

Alexessander Alves, Rui Camacho, Eugenio Oliveira 195 

Enhanced ICA Mixture Model for Unsupervised Classification. 

Patricia R. Oliveira, Roseli A. F. Romero 205 

Analysis of Galactic Spectra Using Active Instance-Based Learning and 
Domain Knowledge 

Olac Fuentes, Thamar Solorio, Roberto Terlevich, Elena Terlevich . . . 215 

Adapting Evolutionary Parameters by Dynamic Filtering for Operators 
Inheritance Strategy 

Xavier Bonnaire, Maria-Cristina Riff 225 

Collaborative Filtering Based on Modal Symbolic User Profiles: 

Knowing You in the First Meeting 

Byron Bezerra, Francisco Carvalho, Gustavo Alves 235 




Table of Contents 



XV 



Machine Learning by Multi-feature Extraction Using Genetic 
Algorithms 

Leila S. Shafti, Eduardo Perez 246 

Natural Language Processing 

Assignment of Semantic Roles Based on Word Sense Disambiguation 

Paloma Moreda Pozo, Manuel Palomar Sanz, Armando Suarez Cueto 256 

Multi-session Management in Spoken Dialogue System 

Hod Nguyen, Jean Caelen 266 

Semantically-Driven Explanatory Text Mining: Beyond Keywords. 

John Atkinson- Abutridy 275 

An Electronic Assistant for Poetry Writting 

Nuno Mamede, Isabel Trancoso, Paulo Araujo, Ceu Viana 286 

Improving the Performance of a Named Entity Extractor by Applying 
a Stacking Scheme 

Jose A. Troyano, Victor J. Diaz, Fernando Enriquez, Luisa Romero . 295 

Automatic Text Summarization with Genetic Algorithm-Based 
Attribute Selection 

Carlos N. Silla Jr., Gisele L. Pappa, Alex A. Freitas, 

Celso A. A. Kaestner 305 

Coordination Revisited - A Constraint Handling Rule Approach 

Didce Aguilar- Solis, Veronica Dahl 315 

Question Answering for Spanish Based on Lexical and Context 
Annotation 

Manuel Perez- Coutiho, Thamar Solorio, Manuel Montes-y- Gomez, 

Aurelio Lopez-Lopez, Luis Villasehor-Pineda 325 

Knowledge Representation and Reasoning 

A Max-SAT Solver with Lazy Data Structures 

Teresa Alsinet, Felip Manya, Jordi Planes 334 

Three Valued Logic of Lukasiewicz for Modeling Semantics of Logic 
Programs 

Mauricio Osorio, Veronica Borja, Jose Arrazola 343 

Answer Set Programming and S4 

Mauricio Osorio, Juan Antonio Navarro 353 

A Rippling-Based Difference Reduction Technique to Automatically 
Prove Security Protocol Goals 

Juan Carlos Lopez, Raid Monroy 364 




XVI Table of Contents 



On Some Differences Between Semantics of Logic Program Updates 

Joao Alexandre Leite 375 

Towards CNC Programming Using Haskell 

G. Arroyo, C. Ochoa, J. Silva, G. Vidal 386 

Well Founded Semantics for Logic Program Updates 

F. Banti, J. J. Alferes, A. Brogi 397 

A First Order Temporal Logic for Behavior Representation 

Carlos Rossi, Manuel Enciso, Angel Mora 408 

Improved Tripling for Optimizing Multi-paradigm Declarative Programs 

Soledad Gonzalez, Gines Moreno 419 

Polynomial Classes of Boolean Formulas for Computing the Degree of 
Belief 

Guillermo De Ra Luna 430 

Knowledge Discovery and Data Mining 

Combining Quality Measures to Identify Interesting Association Rules. 

Edson Augusto Melanda, Solange Olivera Rezende 441 

Two Partitional Methods for Interval- Valued Data Using Mahalanobis 
Distances 

Renata M.C.R. de Souza, Francisco A.T. de Carvalho, 

Camilo P. Tenorio 454 

A Classifier for Quantitative Feature Values Based on a Region 
Oriented Symbolic Approach 

Simith T. D'Oliveira Junior, Francisco A.T. de Carvalho, Renata 
M.C.R. de Souza 464 

The Protein Folding Problem Solved by a Fuzzy Inference System 
Extracted from an Artificial Neural Network 

Eduardo Battistella, Adelmo Luis Cechin 474 

Robotics 

A Multi-robot Strategy for Rapidly Searching a Polygonal Environment 

Alejandro Sarmiento, Rafael Murrieta-Cid, Seth Hutchinson 484 

Internet-Based Teleoperation Control with Real-Time Haptic and 
Visual Feedback 

Fernando D. Von Borstel, Jose L. Gordillo 494 

Representation Development and Behavior Modifiers 
Carlos R. de la Mora B, Carlos Gershenson, 

V. Angelica Garcia- Vega 504 




Table of Contents XVII 

New Technique to Improve Probabilistic Roadmap Methods 

Antonio Benitez, Daniel Vallejo 514 

A New Neural Architecture Based on ART and AVITE Models for 
Anticipatory Sensory-Motor Coordination in Robotics 

J. L. Pedreho-Molina, 0. Florez- Giraldez, J. Lopez- Coronado 524 

Development of Local Perception-Based Behaviors for a Robotic Soccer 
Player 

Antonio Salim, Olac Fuentes, Angelica Munoz 535 

Statistical Inference in Mapping and Localization for Mobile Robots 

Anita Araneda, Alvaro Soto 545 

Fusing a Laser Range Finder and a Stereo Vision System to Detect 
Obstacles in 3D 

Leonardo Romero , Adrian Nunez, Sergio Bravo, Luis E. Gamboa 555 

Adaptive Automata for Mapping Unknown Environments by Mobile 
Robots 

Miguel Angelo de Abreu de Sousa, Andre Riyuiti Hirakawa, 

Joao Jose Neto 562 

Computer Vision 

Digital Image Processing of Functional Magnetic Resonance Images to 
Identify Stereo-Sensitive Cortical Regions Using Dynamic Global Stimuli 
Hector- Gabriel Acosta-Mesa, Nicandro Cruz-Ramirez, John Frisby, 

Ying Zheng, David Buckley, Janet Morris, John Mayhew 572 

An Image Analysis System to Compute the Predominant Direction of 
Motion in a Foucault Pendulum 

Joaquin Salas, Jesica Flores 582 

A Perceptual User Interface Using Mean Shift 

Edson Prestes, Anderson P. Ferrugem, Marco A. P. Idiart, 

Dante A. C. Barone 590 

Projected Fringe Technique in 3D Surface Acquisition 

Carlos Diaz, Leopoldo Altamirano 600 

Optimized Object Recognition Based on Neural Networks Via 
Non-uniform Sampling of Appearance-Based Models 

Luis Carlos Altamirano, Matias Alvarado 610 

A Statistical Validation of Vessel Segmentation in Medical Images 

Francisco L. Valverde, Nicolas Guil, Enrique Dominguez, Jose Munoz 617 



Structural Recognition with Kernelized Softassign 
Miguel Angel Lozano, Francisco Escolano .... 



626 




XVIII Table of Contents 



Kernel Based Method for Segmentation and Modeling of Magnetic 
Resonance Images 

Cristina Garcia , Jose Alt Moreno 636 

Real- Valued Pattern Recall by Associative Memory 

Humberto Sossa, Ricardo Barron, Jose L. Oropeza 646 

Binary Associative Memories Applied to Gray Level Pattern Recalling 
Humberto Sossa, Ricardo Barron, Francisco Cuevas, Carlos Aguilar, 

Hector Cortes 656 

Selection of an Automated Morphological Gradien Threshold for Image 
Segmentation. Application to Vision-Based Path Planning 

F. A. Pujol, P. Suau, M. Pujol, R. Rizo, M. J. Pujol 667 

Color Image Classification Through Fitting of Implicit Surfaces 
Raziel Alvarez, Erik Millan, Ricardo Swain- Oropeza, 

Alejandro Aceves-Lopez 677 

Transforming Fundamental Set of Patterns to a Canonical Form to 
Improve Pattern Recall 

Humberto Sossa, Ricardo Barron, Roberto A. Vazquez 687 

Uncertainty and Fuzzy Systems 

Nonlinear System Identification Using ANFIS Based on Emotional 
Learning 

Mah.di Jalili-Kharaajoo 697 

Improving k-NN by Using Fuzzy Similarity Functions 

Carlos Morell, Rafael Bello, Ricardo Grau 708 

Decomposing Ordinal Sums in Neural Multi-adjoint Logic Programs. 

Jesus Medina, Enrique Merida-Casermeiro, Manuel Ojeda-Aciego .... 717 

Comparing Metrics in Fuzzy Clustering for Symbolic Data on SODAS 
Format 

Alzennyr Silva, Francisco Carvalho, Teresa Ludermir, 

Nicomedes Cavalcanti 727 

Genetic Algorithms and Neural Networks 

Estimating User Location in a WLAN Using Backpropagation Neural 
Networks 

Edgar A. Martinez, Raul Cruz, Jesus Favela 737 

On the Optimal Computaion of Finite Field Exponentiation 
Nareli Cruz-Cortes, Francisco Rodriguez-Henriquez, 

Carlos A. Coello Coello 747 




Table of Contents XIX 



Particle Swarm Optimization in Non-stationary Environments 

Susana C. Esquivel, Carlos A. Coello Coello 757 

An Efficient Learning Algorithm for Feedforward Neural Network 

Songbo Tan, Jun Gu 767 



Combining Data Reduction and Parameter Selection for Improving 
RBF-DDA Performance 

Adriano L. I. Oliveira, Bruno J. M. Melo, Fernando Buarque L. 



Neto, Silvio R.L. Meira 778 

Bidirectional Neural Network for Clustering Ploblems 

Enrique Dominguez, Jose Munoz 788 

Reducing the Complexity of Kernel Machines with Neural Growing 
Gas in Feature Space 

Luigina D Amato, Jose AH Moreno, Rosa Mujica 799 

Multirecombinecl Evolutionary Algorithm Inspired in the Selfish Gene 
Theory to Face the Weighted Tardiness Scheduling Problem 

A. Villagra, M. De San Pedro, M. Lasso, D. Pandolfi 809 

A Novel Approach to Function Approximation: Adaptive Multimodule 
Regression Networks 

Wonil Kim, Chuleui Hong, Changduk Jung 820 

A Novel Hybrid Approach of Mean Field Annealing and Genetic 
Algorithm for Load Balancing Problem 

Chuleui Hong, Wonil Kim, Yeongjoon Kim 830 



Geodesic Topographic Product: An Improvement to Measure Topology 
Preservation of Self-Organizing Neural Networks 

Francisco Florez Revuelta, Juan Manuel Garcia Chamizo, 



Jose Garcia Rodriguez, Antonio Hernandez Saez 841 

A Genetic Algorithm for the Shortest Common Superstring Problem 

Luis C. Gonzalez- Gurrola, Carlos A. Brizuela, Everardo Gutierrez . . . 851 

Improving the Efficiency of a Clustering Genetic Algorithm 
Eduardo R. Hruschka, Ricardo J. G. B. Campello, 

Leandro N. de Castro 861 

The Hopfield Associative Memory Network: Improving Performance 
with the Kernel “Trick” 

Cristina Garcia, Jose Ali Moreno 871 

A Cultural Algorithm with Differential Evolution to Solve Constrained 
Optimization Problems 

Ricardo Landa Becerra, Carlos A. Coello Coello 881 




XX 



Tabic of Contents 



AI in Education 

An Approach of Student Modelling in a Learning Companion System 

Rafael A. Faraco, Marta C. Rosatelli, Fernando A. O. Gauthier 891 

A BDI Approach to Infer Students Emotions 

Patricia A. Jaques, Rosa M. Viccari 901 

Mobile Robotic Supported Collaborative Learning (MRSCL) 

Ruben Mitnik, Miguel Nussbaum, Alvaro Soto 912 

Evaluation of the Teaching-Learning Process with Fuzzy Cognitive Maps 
Ana Lilia Laureano- Cruces, Javier Ramirez-Rodriguez, 

Amador Teran- Gilmore 922 

Miscellaneous Topics 

Using Simulated Annealing for Discrete Optimal Control Systems Design 

Horacio Martinez- Alfaro, Martin A. Ruiz-Cruz 932 

Determination of Possible Minimal Conflict Sets Using Constraint 
Databases Technology and Clustering 

M. T. Gomez-Lopez, R. Ceballos, R. M. Gasca, S. Pozo 942 

Implementation of a Linguistic Fuzzy Relational Neural Network for 
Detecting Pathologies by Infant Cry Recognition. 

Israel Suaste-Rivas, Orion F. Reyes-Galviz, Alejandro Diaz-Mendez, 

Carlos A. Reyes- Garcia 953 

Adding Personality to Chatterbots Using the Persona-AIML 
Architecture. 

Adjamir M. Galvao, Flavia A. Barros, Andre M.M. Neves, 

Geber L. Ramalho 963 

DIMExlOO: A New Phonetic and Speech Corpus for Mexican Spanish 
Luis A. Pineda, Luis Villasenor Pineda, Javier Cuetara, 

Hayde Castellanos, Ivonne Lopez 974 



Author Index 



985 




Checking Social Properties of Multi-agent Systems 
with Activity Theory 



Ruben Fuentes, Jorge J. Gomez-Sanz, and Juan Pavon 

Universidad Complutense Madrid, Dep. Sistemas Informaticos y Programacion, 
28040 Madrid, Spain* 

{ruben, jjgomez, j pavon }@s ip .ucm. es 
http : / /grasia . fdi .ucm. es 

Abstract. Many approaches of the agent paradigm emphasize the social and 
intentional features of their systems, what are called social properties. The 
study of these aspects demands their own new techniques. Traditional Software 
Engineering approaches cannot manage with all the information about these 
components, which are as related with software development as with social 
disciplines. Following previous work, this paper presents a framework based in 
the Activity Theory to specify and verify social properties in a development 
process for multi-agent systems. Using this framework developers acquire tools 
for requirements elicitation and traceability, to detect inconsistencies in their 
specifications, and to get new insights into their systems. The way of working 
with these tools is shown with a case study. 

Keywords: Multi-agent Systems Development, Activity Theory, Validation 
and Verification. 



1 Introduction 

Multi-Agent Systems (MAS) are usually conceived as organizations of autonomous 
and rational entities that work together to achieve their common goals. This 
perspective implies that aspects of organization, cognition, development, and 
motivation have to be deeply study. However, existing MAS methodologies usually 
simplify the analysis of these social properties to a problem of defining roles and 
power relationships (as it is the case for INGENIAS [ 14] or KAOS [2]). Since social 
features comprehend a richer set of features, they demand additional theoretical 
background, abstractions, and techniques. 

As a source for this required knowledge, we have considered research in social 
sciences, concretely the Activity Theory. The Activity Theory (AT) [10] is a cross- 
disciplinary framework for the study of human doings embedded in its socio, cultural, 
and historical context. It considers that the social and individual levels of human 
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activities are intrinsically interleaved. Besides, the social component also includes the 
historical development of those activities as the experience collected by the society 
carrying them out. Our previous work introduced an approach to model some social 
aspects of MAS with the AT. This approach includes a UML language for AT 
concepts and processes to use its techniques in MAS development. In [3] we used the 
AT as a basis to detect contradictions in the MAS design process. In [5], a similar AT 
based schema was the tool to identify requirements. The evolution of this work takes 
us to consider that requirements and contradictions can be seen as particular cases of 
social properties. Consequently, AT is suggested as an appropriate framework to 
address the use of social properties in MAS development. 

In the AT framework for MAS, social properties are described through patterns 
that include a textual explanation and a diagram in the UML language for AT. The 
properties have a set of match patterns, which allow its detection, and a set of solution 
patterns, which suggest modifications in the models. The application of these patterns 
to a concrete methodology is possible thanks to the use of mappings. Mappings 
specify how to translate concepts from AT to those in the MAS methodology. These 
correspondences allow the applicability of this approach in different MAS 
methodologies and a semi-automated method to work with social properties. 

So far, we have classified social properties in three types according to their role in 
development. These properties can represent configurations of the system that 
developers have to preserve (e.g. requirements), or to avoid (e.g. contradictions in the 
information), or knowledge about the MAS (e.g. the existing organization). 

The remaining paper has six additional sections. Section 0 discusses about the 
need of new techniques to cope with the use of social properties inherent to the agent 
paradigm. Then, section 0 explains the way of describing social properties in terms of 
the UML language for AT concepts, while section 0 gives some examples of these 
properties in the three identified categories. Section 0 describes a method, which can 
be automated, to use the social properties in the validation and verification of models. 
Section 0 shows the use of this method with a case study on a real specification. 
Finally, the conclusions discuss the results obtained in the validation and verification 
of social and intentional properties of MAS with this approach. 

2 The Need of Tools for Social Features in MAS 

Software Engineering involves a continuous research about new concepts and 
methodologies that make possible building more complex software systems. One of 
the main advantages of the agent paradigm is that it constitutes a natural metaphor for 
systems with purposeful interacting agents, and this abstraction is close to the human 
way of thinking about our own activities [11]. This foundation has lead to an 
increasing interest in social sciences (like in the works of [2], [11], and [15]) as a 
source for new concepts and methods to build MAS. However, this interest has hardly 
covered some interesting social features of the MAS design. Here, the term “ social 
features ” encompasses organization culture, politics, leadership, motivation, morale, 
trust, learning, or change management. There are two main reasons in MAS research 
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to give a novel and special attention to these features: the human context and the own 
essence of the agent abstractions. 

Firstly, the environment of a software system, defined as the real world outside it, 
is usually a human activity system [13]. Its study must consider then the social 
features of humans and their societies. Software Engineering research in branches like 
Requirements Engineering [1], CSCW [6], or HCI [9], already makes an extensive 
use of social disciplines to grasp the relevant information about the human context. 

Besides the human context, MAS are modelled in a very alike fashion to human 
organizations, as societies of collaborating intentional entities [11], [16]. It gives a 
possibility of describing some properties of the system and its context, the social and 
intentional ones, in quite a uniform way, taking advantage of the knowledge extracted 
from human sciences. 

These reasons take us to consider the need of generic mechanisms to model these 
social properties of the MAS and to validate the specifications against them. These 
properties do not appear in traditional software methodologies. If MAS design has to 
be a new level of abstraction for system design, it needs to develop innovative tools 
that consider these social concepts. Previous experience with the description and 
checking of social properties about contradictions [3] and requirements [5], take us to 
propose the use of the AT for this purpose. 

3 Describing Social Properties with AT 

Before considering what the better manner of representing a social property for 
development is, we must consider that it can represent several types of information 
for that development; it can correspond to a configuration to keep, i.e. a pattern 
property, or to avoid, i.e. an anti-pattern property, or a feature to discover, i.e. a 
descriptive property. In the case of a pattern property, a problem with it occurs when 
there is no match with the pattern or just a partial one. For configurations to avoid, 
problems arise when there is a complete match. The descriptive properties help 
developers to discover new information about a given system, so there are no 
conflictive situations inherently related with their absence or presence. 

According to these possible roles of social properties in MAS development, their 
representation has to fulfil three aims. Firstly, the properties have to be a tool for the 
development. Their description should be in a language understandable by developers 
and suitable for automated processing. Secondly, the representation should build a 
common language between customers and developers. The development of a system 
is a joint venture of people with very different backgrounds, what makes difficult 
reciprocal understanding. AT vocabulary uses social abstractions, which are close to 
both customers and developers and therefore should facilitate their mutual 
understanding [1]. Finally, the representation of properties should help to solve the 
problems related with themselves, such as the non-accomplishment of requirements or 
the appearance of contradictions. 

To satisfy these requirements, social properties are represented with two 
components: a set of match patterns and other of solution patterns. Each pattern of 
these components has two possible representations, a textual form and another one 
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based on the use of UML stereotypes to represent AT concepts [3]. The UML form is 
the basis for the automated process of pattern detection and problem solving. In this 
process, the stereotypes and names of the UML form can be variables or fixed values. 
This allows fixing some values of the properties before the detection procedure and 
combining patterns through shared values. On the other hand, the textual form is 
intended to give further information about the patterns. It helps customers to 
understand the meaning of the used UML notation and enables both customers and 
developers to know the social interpretation of the pattern. 

According to our description, a social property can include two different sets of 
patterns: one for detection and other for solution. A match pattern describes a set of 
entities and their relationships, which represents the property. It acts as a frame that 
has to be instantiated with information from the specification. If a set of artefacts in 
the specification fits into the pattern, the property is satisfied. In the case of both 
properties to keep and properties to avoid, the solution pattern is a rearrangement of 
its corresponding match pattern, maybe with additional elements. Solution patterns 
can correspond to partial or full matches. For the total absence of match, there is no 
point in defining a solution. 

It is remarkable to note that patterns for detection and solution are not tied in fixed 
pairs; moreover, they can be reused or combined through shared variables to describe 
new situations. An example of this possibility can be found in the case study 
described in section 0, which combines match patterns to describe situations that are 
more complex than originally. 

4 Social Properties for MAS 

This section presents examples of properties used in MAS development for every type 
previously identified and gives their representation. The first one corresponds to a 
pattern property that is a requirement describing a social setting in the system 
environment, and it is adapted from the Activity Checklist [9]. The second one is an 
anti-pattern property describing a contradiction from the AT, the Exchange Value 
contradiction extracted from [8]. Finally, there is a descriptive property about a 
hierarchical organization as described in [16]. In the present section, the textual form 
of the properties is included as part of the explanation of the diagrams. Words in 
italics represent concepts from the AT vocabulary. 




Fig. 1. A question about the system context 



The Activity Chec7klist [9] is an analytical tool to elicit contextual knowledge 
about an activity. It is composed by a set of aspects that have questions expressed in 
terms of natural language to grasp their information. One of these aspects is 
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concerned with the role that current technology plays in producing the desired 
outcomes. A related question with this aspect is “What are the work activities 
involving the target system?”. Fig. 1 shows that the relevance of a tool in the 
organization is given by the activities in which it participates. The study of these 
activities helps developers and customers to assess the importance of the given 
technology. 

The second example adapts the Exchange Value contradiction described by [8], In 
a MAS, this contradiction emerges when a subject has to generate a product to be 
consumed by other members of the community. However, none of the generator 
subject’s goals are satisfied by that product or he is not motivated enough to do it, for 
example because the task has also negative effects. Consequently, the subject that is 
able to create the product and give it to the community refuses to do it. The left side of 
Fig. 2 illustrates a version of this contradiction in which the agent does not have an 
objective related to the creation of the outcome , and for that reason the task never gets 
executed. Therefore, the members of the community are unable to satisfy their needs. 
The transition of the required product, i.e. Outcome 1, from outcome in Activity 1 to 
artifact in Activity 2 is indicated with the relation “ change of role”. 
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Fig. 2. The Exchange Value contradiction on the left and a possible solution on the right 



A possibility to overcome the Exchange Value contradiction is to encourage the 
subject by giving him a reward for executing the task. The subjects that benefit from 
the product of the first subject should provide him with products that satisfy one of its 
needs. The model on the right of Fig. 2 shows this new situation. 

The final example is an identification of the social structures in MAS. In the 
overview about MAS by [16], several types of possible organizations for the 
community of agents were identified. A hierarchy was defined as an organization 
where “The authority for decision making and control is concentrated in a single 
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problem solver [...]. Superior agents exercise control over resources and decision 
making.”. This situation can be described with Fig. 3. The Superior Agents can give 
orders that Subordinated Agents have to accomplish, that is, Superior Agents are able 
to generate new objectives for their Subordinated Agents. The relationships labelled 
“ change of role ” represents that the outcome of the activity that generates the orders, 
i.e. the Objective, becomes an objective for the Subordinated Agent. 




Fig. 3. A hierarchical organization in a MAS 



Detecting this kind of descriptive-properties about the MAS gives developers 
information about how is their system. This knowledge can help them to decide the 
best design solution to challenges in their systems. Examples of this are ways to solve 
negotiation problems in a MAS according to its organizational structure. 

5 Checking a Specification with Social Properties 

This section introduces a method to check social properties against a specification. It 
is a generalization of the one used for AT contradictions [4]. 

The method needs three parameters: 

• Mappings. They allow translations between the concepts of AT and the given 
agent oriented methodology. In this way, the method becomes generic as it could 
be applied to any agent-oriented methodology, without demanding a vocabulary 
based on AT concepts. A more detailed description on how to build these 
mappings and an example with the INGENIAS methodology [14] can be found 
in [4], 

• Social properties to verify. They are described as shown in preceding sections. 
They can be predefined, e.g. requirements [5] or contradiction patterns [3] of the 
AT, or defined by users. The process itself can then be regarded as “validation”, 
i.e. when its patterns are requirements, and “verification”, i.e. when its patterns 
represent other kind of properties. 

• MAS to check. Since the process uses mappings, there is no prerequisite about the 
language of the specification as long as it based in the agent paradigm. 

The checking process itself includes the following steps: 

1. Translate the MAS specifications to the AT language with the mappings. 
Translation is not a trivial task since correspondences between structures and 
their translations are usually relations “many to many”. So, the translated 
structure needs to keep a reference to the original one. 
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2. For every property, look for correspondences of its match patterns in the 
specification. The process of properties detection is one of pattern matching. 
Models are traversed seeking groups of elements with the same structure and 
slot values that match patterns. When a corresponding structure is found, the 
property is considered as satisfied. Partial matches are also possible as they can 
represent a conflictive situation, for example a requirement which is not 
preserved. 

3. For a matching in the models, propose the customized solution pattern. A match 
pattern, or a part of it, can have a related solution pattern. This pattern describes 
a change in the models to solve a problem or enhance some aspect of the 
specifications. Typically, these solutions are rearrangements of the elements 
involved in the match pattern where additional elements can be involved. 

This method can be semi-automated, but user interaction is still needed to 
determine values for the variables in patterns, decide when a match makes real sense, 
or the best manner to modify models. 

The main advantages of the overall approach over others to import social sciences 
in Software Engineering (like those in [1], [6], and [12]) are that this proposal uses 
UML, which is well known for developers and more adequate for users than formal 
languages, and it provides a structured method to work with its social properties. 

6 Case Study 

In order to show how to apply the social properties and the checking process, this 
paper presents a case study based on agent teams in the programming environment 
Robocode [7], Robocode simulates tank battles through robots that actuate with 
predefined primitives (like ahead, turn left, and fire) and perceive the situation with 
position and radar sensors and the detection of some events. Developers have to 
program the behaviour of those tanks to destroy their enemies and survive the battle. 

The proposed case study considers collaboration between tanks in this frame. It is 
modelled with the INGENIAS methodology [14] and its full specification can be 
found at http://ingenias.sourceforge.net. The case considers armies of collaborative 
tanks composed by squadrons. In every squadron, agents can play one of two roles: 

• Soldiers. These are the basic members of the squadron. They try to survive the 
battle while destroying their enemies and accomplishing the orders of their 
captain. 

• Captains. They are the leaders of the squadrons. Showing the basic performance 
of a soldier, they a captain also plans a strategy for its soldiers, communicates it to 
them, tracks the course of the battle, and makes modifications to the planning if 
needed. A squadron can have just one captain but several soldiers. 

The previous situation is summarized in Fig. 4. The elements involved are: 
DynamicArmy, which is an organization; DynamicGroup, which is a group of tanks; 
TeamAgent, TeamMate and TeamLeader, which are agents; Soldier and Captain, 
which are roles; circles represent goals. A TeamLeader pursues CommandSoldiers . 
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The task to satisfy this objective generates and communicates orders to the captain's 
troops. A TeamMate of these troops tries to satisfy the goal AccomplishOrders. Thus, 
it tries to obey the orders of its TeamLeader. A common goal for all the agents, which 
is inherited from the TeamAgent, is SurviveBattle. This goal forces the agents to 
preserve their life, whatever the situation can be. 
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Fig. 4. Specification of a Robocode army with INGENIAS 



A possible strategy for the squadron is to have scouts that go forward and 
backward from its lines in order to determine the position of enemy troops. With this 
information its leader can plan an attack in a very precise way. The problem is that 
this is a high-risk task for the agent who accomplishes it, which contradicts its main 
objective of SurviveBattle. Fig. 5 represents this situation where GeneratePlan is a 
task for a TeamLeader and Explore is for a TeamAgent. 



Explore 

CD" 

\ 



«WFProduces» 



InformationAboutEnemy 



> 




«WFConsumes» 



GeneratePlan 

- CD 



«GTSatisfies» 

& 

LocateEnemy 




— «ContributeNegatively> 



SurviveBattle 

^ O 



«GTSatisfies» 

CommandSoldiers 




Fig. 5. Tasks, goals, and mental entities involved in the explore strategy 



Since Explore is a task that contributes negatively to a compulsory goal, i.e. 
SurviveBattle , the TeamAgent can avoid its execution for the sake of other objectives. 
However, this task is vital to the global strategy of its squadron. This situation 
corresponds to a variant of the Exchange Value Contradiction that can be seen in Fig. 
6 if suppressing the SoldierOrders entity. 
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Given the Exchange Value contradiction, the solution pattern previously 
introduced would suggest the addition of a new activity to the TeamLeader, which 
would allow him to reward the TeamAgent. Nevertheless, this is not necessary, 
provided the fact that the TeamLeader and the TeamAgent are embedded in a 
Hierarchy pattern (presented in Fig. 3), which enables the TeamLeader to solve the 
contradiction through commands without additional activities. The solution appears in 
Fig. 6 where the entities in the rectangle are the part added to the exchange value 
contradiction because of the of the hierarchy pattern. 
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Fig. 6. Exchange Value contradiction and its solution in a hierarchy 



This final graphical model in Fig. 6 deserves some careful considerations. First, 
the combined use of several patterns has allowed building a complex pattern and an 
innovative solution different to the standard with a new activity. The second 
observation is that the solution does work because of a qualitative arithmetic of goals. 
The TeamAgent has to know that AccomplishOrders is even more relevant than 
SurviveBattle ; otherwise, a different conflict between contradictory objectives arises. 

Independently of the solution adopted, the final decisions about the adequacy of 
the match, the proposed solution, and other patterns always require human judgement. 

7 Conclusions 

This paper gives a novel approach to manage social properties in MAS specifications. 
Social properties are related with the most specific components of the agent paradigm, 
i.e. its social and intentional features. Social sciences have studied this kind of 
features for a long time in human societies and can provide useful insights in them. 
The Agent Oriented Software Engineering can profit of this knowledge to improve its 
own understanding about these aspects and create new techniques that works with the 
complex interactions between their systems and the surrounding human environment. 

Following previous work [3], [4], [5], this proposal uses the Activity Theory as the 
foundation to study these social properties. The knowledge of the AT has crystallized 
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in several tools for the MAS development: a way to define social properties, a process 
to check them in MAS specifications, and a library of social properties. 

The description of social properties with two representations, one with natural 
language and other with UML, tries to make them understandable for both customers 
and developers. Customers know the real domain of the system and have to 
communicate that information to the developers that use it in the implementation. Of 
course, the used language does not involve that subjacent concepts are understood. 
Here, AT takes advantage of the fact that its concepts are grounded in common 
knowledge about our own human societies, and then, it can be easily internalised by 
all the members in the development team. 

To verify the social properties in a MAS development, this paper describes a 
method independent of the considered MAS methodology. The process uses 
mappings to translate specifications between the AT and the MAS language. In this 
way, developers do not need to learn a new methodology and can use their own tools. 

The third element of our approach is the set of libraries that collects information 
from AT studies and MAS projects. These libraries are repositories of predefined 
social properties that developers can use in their projects. Currently, there are two of 
these libraries available: one for requirements based in the Activity Checklist with 
twenty properties now; and other for contradictions according to AT research, which 
includes ten properties nowadays. These repositories are integrated with the 
INGEN1AS Development Kit as a proof of the feasibility of the overall approach. 

This ongoing research has three main open issues in its current status. The first 
one is about the need of interactive work with the tools. Users have to decide what 
properties to check, how they have to be customized, judge their meaning in their 
specifications, and select the better way of modifying the system. Although user’s 
judgement will always be necessary, his workload can be reduced with more detailed 
patterns and increased reasoning capabilities in the checking method. The second 
issue is related with the expressive power of the UML language for AT. This 
language has to support the translation of the knowledge from sources in the very rich 
natural language. Our UML notation cannot support all the features of the natural 
language but should include a proper set of primitives to transmit the key meanings of 
the social properties and allow the automated processing at the same time. Finally, the 
enrichment of the properties libraries is also work to do. 
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Abstract. Previous research works have demonstrated that traffic control models 
based on the comparison between an historical archive of information and current 
traffic conditions tend to produce better results, usually by improving the system's 
proactivity behavior. Based on this assumption, we present in this paper MARCS 
- Multi-Agent Railway Control System, a multi-agent system for communications 
based trains traffic control. For this purpose we have developed a system infras- 
tructure based on an architecture composed of two independent layers: ’’Control” 
and "’Learning”. 

’’Control” layer is responsible for traffic supervision, regulation, security and 
fluidity, including three distinct agent types: ’’Supervisor”, ’’Train” and ’’Station”. 

The "Learning” layer, using situations accumulated by the ’’Control” layer, 
will infer rules that can improve traffic control processes, minimizing waiting 
time and stop orders sent for each train. At this moment, inferred rules seem like: 

”At Ti moment, when a train is located at Pi = (xl, y 1) with destination Ei 
and another one is at P2 = ( x2 , y2) with destination E 2 , a traffic conflict in Li 
after t\ seconds” will occur. 

Rules of this kind are transmitted to the control system to be taken into account 
whenever a similar traffic situation is to occur. In the learning process we apply 
an unsupervised learning algorithm (APRIORI). 



1 Introduction 

1.1 Motivation 

Railroad traffic volume will have in the next two decades a significant increment. More 
people and merchandize will circulate in increasingly bigger and faster railway net- 
works [4]. 

Although traffic’s scheduling systems guarantee that, for the foreseen conditions, 
vehicles in circulation will not compete simultaneously for the same resources (do not 
conflict), they lack of flexibility in order to enforce security. 

Once railway networks will be under dynamic conditions it is most desirable that the 
control system becomes more flexible knowing how to provide an answer to these new 
requirement in an autonomous way. 
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Our propose consists in a completely distributed, decentralized and adaptable archi- 
tecture for railway traffic control in a communications based train control [3]. 

1.2 Summary 

The proposed system can be decomposed in two different sub-systems: ’’Control” and 
’’Learning”. 

’’Control” sub-system is responsible for traffic management and guidance in the net- 
work and includes three agent types: (’’Supervisor”, ’’Train” and ’’Station”). Those agents 
must interact with the objective of providing control mechanisms to the displacement 
of each train in the network. In this context, security is the principal concern, trying to 
assure that train crashes will never occur. Having security guaranteed, it is then important 
to maximize systems’s overall efficiency, by minimizing conflicts between trains. In our 
terms, a trains ’’conflict” occurs when several trains wish to cross at same time the same 
place. When that situation occurs, control sub-system is responsible for assigning pri- 
ority, and sending "stop" commands to trains that must wait until the resource crossing 
zone is free, 

’’Learning” sub-system is the complementary one and has the objective of analyzing 
system’s past accumulated situations descriptions and identify typical cases that became 
the origin of later conflicts. The objective is to make the learning sub-system to infer 
rules that anticipate train conflicts and be able to make the ’’Control” system to benefit 
from them. 

’’Control” sub-system must compare all current train positions with the ones that 
can been identified by each rule. If any match is found, the predicted conflict must be 
avoided and all necessary actions must be taken in order to do it. 

As we will show in section 4 this proactive behavior tends to minimize train conflicts 
and, under certain conditions, improve system’s efficiency. 

1.3 Related Work 

Probably induced by increasing traffic congestion problems, multi-agent systems applied 
to transportation domain, usually concern road transportation. 

Typical approach tend to divide covered area by several traffic management agents, 
each one coping with decision responsibilities in a specific parcel. Often, authors decide 
to implement agents that represent every vehicle in circulation and the self physical 
infrastructure. 

’’TraMas” [7] is a system aiming at the study of multi-agent systems viability in 
the road transportation domain, as well as testing applicability of different models and 
cooperation strategies in multi-agent systems. In TraMas every cross point is represented 
by a traffic agent that is responsible for respective traffic control. Each one of the traffic 
control agents is independent enough to decide locally, but is also prepared to share 
information (cooperate) with other agents. 

TraMas system includes three layers (Cooperative, Decision and Control). 

[8] proposes a peculiar road traffic multi-agent system. Main objective consists in 
vehicles movement coordination inside a delimited area. Having a GPS-like localization 
method, every vehicle is represented by an agent, and each network parcel is man- 
aged by a control agent. Each control agent is responsible for analyzing specific traffic 
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Fig. 1 . MARCS architecture 



volume and construct the system essential’s component: ’’co-field”. The co-field is a 3D 
representation of the environment, having two principal characteristics: 

- Areas with high traffic density are represented as higher altitude (mountains). 

- Areas with low traffic density are represented as lower altitude (depressions). 

Vehicle agents are responsible for route selection through minimization travel cost. 
As expected, it is cheaper to travel in descendent directions. This fact induces vehicles 
to avoid areas with high traffic density, potentially where they could spent more time in 
result of traffic conflicts. 

Proposed in 1 994, dMARS [9] is a multi-agent system where two agent types interact: 
’’Intersection” and ’’Street”. They establish cooperation mechanisms with neighbors in 
order to produce an emergent system behavior. Every agent gets as input the traffic 
volume in represented area, being responsible for maintain that information and provide 
it to neighbors agents that request it. 

[12] propose a multi-agent system for trains traffic coordination with one peculiar 
characteristic: natural language interface. When a traffic agent fells a high uncertain 
degree about what the correct action is, it starts user interaction section. User will analyze 
current traffic situation and communicate correspondent action in an oral form. This 
information must be achieved by the agents, and applied in next similar situation. 

Several multi-agent systems [11] [10] and models [13] have been proposed, each 
of them with specific characteristics but sharing with the ones referred above: area (and 
responsibilities) sharing and inclusion of cooperation mechanisms enabling to go from 
a local to global perspective. 



2 MARCS Architecture 

Figure 1 gives a global perspective of MARCS architecture, existing agents in both 
sub-systems (’’Control” and ’’Learning”), and interaction mechanisms between them. 
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Following what has been said before, the figure shows the interaction between four 
distinct agent types: 

- Supervisor . These agents must control, guide and guarantee security in traffic 
network for each specific area. Each area is delimited by latitude and longitude 
coordinates. Supervisor agents are the only ones that simultaneously belong to 
control and learning sub-systems. 

- Train. This agent type exclusively belongs to control sub-system. Train agents 
represent correspondent interests and are responsible for train velocity control, de- 
pending on free-distances (”distance-to-go”) assigned by Supervisor agents. 

- Station. Represents the interests of a railway station, and also belongs exclusively 
to control sub-system. Station agents objectives consist in administrate platforms, 
giving orders for trains arrivals and departures and providing useful users (passengers 
in train stations) information. 

- Learning. This agent type belong to ’’Learning” sub-system. Learning agents task 
consists in asking for control agent registry log, analyze it, to identify possible 
existent meaningful patterns to infer possible rules that can optimize traffic fluidity. 

2.1 Control Sub-system 

Control sub-system main objective consists of providing secure and efficient routing for 
all trains while preventing crash situations and maximizing traffic fluidity. 

This is accomplished through Supervisor, Train and Station agents information 
exchange consisting of both data and plans. 

2.2 Learning Sub-system 

As it was reported above , MARCS learning sub-system includes two different agent 
types: Learning and Supervisor, being this one also part of control sub-system. Per- 
formance and efficiency were primary factors analyzed at design time, inducing these 
architecture based on two parallel sub-systems. 

Usually control systems have rigid time requirements making them adequate for 
’’real time”. On other hand, learning processes (specially data mining ones) tend to 
consume much computational resources and spend too much time until useful results 
become available. For our application, it was crucial that control processes priority was 
preserved, as well as guaranteeing that each real time traffic situation could be effectively 
analyzed, with no interference of any other task or objective that an agent possibly could 
have. 

Another important factor is system modularity, which could also facilitate overall im- 
plementation. Once learning process tasks are easily distinguished from control ones, it 
becomes more intuitive the implementation by means of different computational entities. 

Based on these requirements, Learning agents are learning sub-system essential 
components. Their unique objective consists in asking registry activity to Supervisor 
agents, concatenate and analyze it and infer rules that potentially increment system's 
efficiency. 

At creation time, each Learning agent receives a list of Supervisor addresses and 
becomes responsible for periodically asking for their activities record. 
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This consists in a ’’log” file maintained by each agent. It contains all received and 
sent messages and most relevant actions taken. For example: 



1065189498 SEND 
1065189501 RECEIVE 
1065189501 RECEIVE 
1065189501 PROCESS 
1065189501 SEND 
1065189501 SEND 
1065189503 RECEIVE 
1065189503 RECEIVE 
1065189503 PROCESS 



tell: sender supervisorl : receiver trainl : content Distance 290.474074 
tell: sender simulatorl : receiver supervisorl : content Location trainl 112.2 
tell: sender simulatorl : receiver supervisorl : content Location train2 213.2 
Reserve vertex 12 Train trainl 

tell: sender supervisorl : receiver trainl : content Distance 230.882501 
tell: sender supervisorl : receiver train2 : content Distance 75.127011 
tell: sender simulatorl : receiver supervisorl : content Location trainl 117.2 
tell: sender simulatorl : receiver supervisorl : content Location train2 158.4 
Conflict Vertex 13 Trains 2 trainl train2 



21.3 



-125.9 



55.1 

1.4 



After complete log file content's transmission, Supervisor work in learning pro- 
cess's compass is complete, being all following tasks performed by Learning agents, 
like related in the next sections. 



3 Learning Process 

Proposed learning process consists in the analysis of potential conflict situations (Sec- 
tion 2.1) that can occur and identification of train positions that originated those conflicts. 
If similar traffic conditions will repeat, control system must anticipate that conflict and 
take necessary actions to prevent and avoid it. 

Figure 2 shows learning process state diagram. It consists of a preliminary phase of 
data pre-processing, followed by the algorithm execution, result analysis and knowledge 
acquisition. In a final phase, new knowledge is sent back to control sub-system, hoping 
that it will contribute for traffic fluidity and system effectiveness, by avoiding traffic 
conflicts. 



Select 




' Select 




Reduce 


Lines 




Atributes 




Cases 



Gene late 
Rules 



Run 

APRIORI 






Construct 
’rans act ions 



Filter 

Rules 



Transmit 
Rules ‘ 




Fig. 2. Learning Process (State Diagram) 

3.1 Data Pre-processing 

An agent activity record consists in a text file containing all exchanged messages plus 
relevant actions taken. 

Every line has format: 



<time> <type> <description> 
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Where type can be one of ’’send”, ’’receive”, ’’process” or ’’exception” respectively 
for sent and received messages, actions taken or exceptions handled, description con- 
tains message or action additional description. 

In the pre-processing phase there are four stages [5] : line selection, attribute selection, 
instance reduction and transaction construction. 

In the end, data is formatted in an adequate way to be dealt with by next learning 
phase. 



Transaction Construction. The next step consists in building transactions needed for 
algorithm execution. In the APRIORI [ 2] context terms, a transaction consists of an items 
set grouped by any criteria. For this purpose, we will group items relative to historic 
train locations that later originate a traffic conflict. A conflict identification between two 
trains (C a and C'i,) in location L a and moment T t , implies the selection of lines relative 
to absolute position of C a and Cb in past (t — i) moments, i = l..n, being all grouped 
in the same transaction. 

Let i con f be a ’’Conflict” at time t con f and C={Ci, C 2 , ■ ■ ■ , C n } the set of trains 
involved. Define a £ N, as the analysis retrospective limit. For each i :J line relative to 
Cj train location at t 3 time: if (tj < t con f), ( tj >= ( t con f - a)) and Cj £ C then add ij 
to same transaction as i CO nf- 

As an example, see the set of items displayed below: 



ID 


Time 


Type 


Description 


Destination 


1 


ti 


Location 


Train C'i (* 3 , 2 / 3 ) 


Destination D\ 


2 


£2 


Location 


Train C 2 ( x 2 , y 2 ) 


Destination D 1 


3 


£2 


Location 


Train C'i (xi, yi) 


Destination D\ 


4 


t3 


Conflict L\ 


Trains C'i, C 2 




5 


t8 


Location 


Train C 3 ( x 2 , y 2 ) 


Destination D 1 


6 


t8 


Location 


Train C 4 (* 1 , 3 / 1 ) 


Destination D\ 


7 


£9 


Conflict L\ 


Trains C 3 , C 4 





This set originates two transactions (Ti and T 2 ): 



Transaction 


Items 


Ti 


1,2, 3, 4 


t 2 


5,6,7 



3.2 Algorithm for Association Rules 

APRIORI allows the identification of association rules from large data sets grouped 
through transactions. Just as described in [2], let X={«i, * 2 ,. ■ ■ , i m } be a set of literals, 
called items. Let V be a set of transactions, where each transaction is an items set { ij}, 
j= 1 ..k, such that ij £ X. An association rule is an implication of form X => Y, where X C 
X, Y C X and X n Y = 0. The rule X=>Y holds in the transaction set V with confidence 
c if c% of transactions in V that contain X also contain Y. This rule has support s if s% 
of transactions in V contain X U Y. 
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For a transactions set V , the problem of mining association rules consists in generate 
all association rules with support and confidence higher than a specific threshold. 

The logic of this algorithm is based on the observation that if any given set of attributes 
S has support lower than the threshold, any superset of S will have lower support and 
consequently any effort to calculate the support for such supersets will be wasted. For 
example, if we know that {A,B} is not supported it follows that {A,B,C} or {A,B,D} 
will also not be supported [6]. 

3.3 Rule Generation 

After the execution of APRIORI [2 ] , we have identified a frequent sets group T, such that 
J-={F\, f*2,. . . ,F n }, being each F) an items set. Let F,={zi, ii, . . . , i n } be a frequent 
set with dimension n. Adding a constraint to force that consequent only have one item, 
we build a group of n association rules R, with f?={V i 6 F,; : (F) - i) =>• i}. 

For our propose, we consider relevant association rules those with consequent type 
equal to ’’Conflict”, meaning that we are interested in identifying those items (states) 
that usually occur together with a ’’Conflict” item. 

For every frequent set F) with dimension n, we can identify an association rule set 
Ri of dimension m (m < n), such that: 

r G R: {iri,ir2,—,irk} => icon} > T ype(i con f) =” Conflict” 

Like referred above, filtering only two distinct items type (’’Location” and ’’Con- 
flict”), all identified rules during the learning process have the form: ”IF Train C-\ is 
at (#1,3/1) with destination D\ AND Train C2 is at (#2.3/2) with destination D2 AND 
. . . THEN Conflict in L\, Trains ( C -\ , C 2 , ■ ■ ■ ), Time t. 



Rule Selection. After rules generation process, it is important to select the most relevant 
ones and communicate them to the control system's agents. Above described process 
allows the identification of a large association rules set, most of them irrelevant. For 
instance, an inferred rule that will foresee a conflict between trains in the next second 
is not of great utility. On the other hand, if two rules n and r2 foresee conflicts for 
the same trains at the same place in, respectively, t-\ and t,2 seconds ( t\ < t.2) with the 
same support and confidence, then n is irrelevant, because there is another rule that 
anticipates the same situation at a former stage. 

For the rules selection process we have defined a lower threshold /> (l, > 0 ) and 
consider every rule ?’;=(zi, «2,. . . ,i n ) => icon} , with Tim e(i con f) < h irrelevant. In a 
second phase we compare every remaining rule with each other, to analyze if exists a 
better rule for same situation. 

Consider r\ a rule that anticipate a conflict at L, place in about /. , seconds. Also 
let (in, . . . , iin) be locations of most ancient states attached to n trains involved in L, 
conflict. If exists a rule r ? that also anticipate a conflict at L t in tj seconds (tj < tj) with 
(jli, . . . , ji n ) as respective locations for trains and there is only a single possible path 
between respective ji x and n x places then r,; is irrelevant. 
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3.4 Rule Transmission 

After the rule selection process, a relevant set of rules has been identified and we can 
proceed for last stage, its communication to control sub-system agents. This process 
concerns about trains localization. Supervisor agents identification, those with respon- 
sibilities for traffic management at the specific place, and rules transmission. 

Having a rule r*, such that r,=(fi, . . . , i n ) => i C onf , we analyze parameter ’’Position” 
of every item ij, j=l,..n, and determine who’s Supervisor is responsible for those 
coordinates. Hereinafter, we proceed for rules transmission in KQML coded messages 
like the one exemplified in section 4. 

This message informs Supervisor \ that, if two trains have positions of respectively 
(861.0, -4.9) and (714.2,-263.1) and travel with direction Stationi and Station 2 , it 
will occur a traffic conflict in about 61 seconds, at Vertex 0, being Vertex 0 the internal 
representation of a specific railway cross point (switch point). 

3.5 Control System Repercussions 

On analyzing present train positions, Supervisor agents make the comparison with 
every received rule and, finding a match, proceed to ’’conflict avoid” process. 

At this moment, this process consist in asking evolved trains to increase or decrease 
their usual velocity by a% during t seconds, a,t > 0, conforming with the conflict time 
foresaw. 

Traveling during a time interval at superior or inferior velocity then desired, can be 
enough for trains to arrive at predicted point at different moments avoiding the conflict 
and improving system’s performance. 



4 Experimental Results 

Evaluation of a traffic control system could be done under multiple perspectives, perhaps 
with subjective components about the selection of most relevant characteristics. 

’’Security” should always be on top of priorities. It is crucial to assure that the 
system provide sufficient security mechanisms to avoid train collisions. ’’Capacity” and 
’’Efficiency” could also be system evaluation parameters. 

In our traffic simulator system, we implemented five evaluation parameters: 

Crashes Number of train collisions. 

Average Velocity Average velocity trains. 

STOP Number of stop orders sent for trains. 

Time Proportion at Desired Velocity Every train has a optimal velocity, according to 
its physic characteristics. This parameter represents the time proportion ([0,1]) that 
trains traveled at a velocity near the optimal one. 

Simulation Time Total time spent until all trains arrive at final stations. 

Experiments on several simulation scenarios like displayed on figure 3, allow us 
to conclude that rules inferred by the learning system have improved system performance 
by reducing ’’Stop” and ’’Simulation Time” parameters, and increase ’’Average Velocity” 
and ’’Time Proportion at Desired Velocity” without compromising security: 
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Fig. 3. Traffic simulation process 



Parameter 


Before Learning Process 


After Learning Process 


Crashes (Total) 


0 


0 


Stop (Total) 


1 


0 


Average Velocity (Km/h) 


61 


66 


Time Proportion at Des. Vel. [0,1] 


0.366 


0.282 


Simulation Time (mm:ss) 


1.54 


1.48 



In the scenario displayed in figure 3 learning sub-system has identified the rule: 



(tell: sender Aprenderl 

: receiver Supervisorl 
: content (RULE 
Confidence 1 . 0 
Time -67 

TotalPremisses 2 

Premisse Local 861.0 -4.9 Destination Station2 
Premisse Local 768.0 -260.0 Destination Stationl 
Consequent 

Conflict Vertex 0) 



Rule displayed above informs ’’Supervisor” that if two trains with position (861.0, 
-4.9) and (768.0, -260.0) move respectively to ” Stationi” and ” Station 2 ” it will occur 
one traffic conflict at vertex 0 (Pi). 

Repeating simulation process, we observe that ” Supervisor 1 ” demand ’’C 2 ” to re- 
duce his optimal velocity, being this action enough to avoid the conflict at Pi . 



5 Conclusions and Work in Progress 

Experimental results allow us to conclude that, in specific cases, MARCS performance 
has been improved by applying learning system's inferred rules. 

Extending the learning process perspective, it is our intention to apply it to actions 
performed by the system in result of conflict anticipations. By now, learning system has 
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the ability to anticipate conflicts, but cannot select the best action to avoid it, and cannot 
also anticipate whether actions performed to avoid a specific conflict will induce other 
future conflicts herder to be resolved. 

Our work is currently focused on the analysis of the effects of actions performed to 
avoid conflicts, and determine those which are the optimal ones. 

For this purpose, we plan to improve ’’line selection” phase, passing to select ’’Action” 
instances too. These elements specify actions taken by Supervisor agents with the aim 
of avoiding a conflict. 

Flaving this, we expect to infer new knowledge represented in the following example: 

’’Having a train C\ located at I\ =(.x;-| , y \ ) with destination l) \ and another C-j located 
in P l 2=(X2, 2/2) with destination ZJ 2 it will occur a conflict in L 1 in t \ seconds. The best 
way to avoid this conflict is ask C\ to decrease 10% his average velocity. Train C 2 
must not accelerate because it will conflict with another one (C 3 ) in L 2 about t 2 
seconds later (f 2 > i\ )”. 

Conflicts transitivity is other factor that we also plan to analyze, grouping conflicts 
that apply to common trains. We want to derive optimal actions to avoid groups of 
conflicts and not just isolated ones. 
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Abstract. Real-time multimedia streaming applications are becoming increas- 
ingly popular on the Internet; however, the users of these streaming services 
often find their quality to be insufficient. When the services are based on multi- 
point videoconferencing, it is difficult to reach a constant, suitable level of 
quality on the video transmissions. In this paper, we propose to increase the 
overall quality of multipoint videoconferencing by dynamically acting on the 
JPEG quality parameter of each individual videoconference. To do this, we as- 
sign a fuzzy agent controller to each single videoconference. Since the agent’s 
fuzzy logic is based on frame ratios (and JPEG qualities), the videoconference 
qualities will be dynamically equilibrated when a bottleneck is reached in any 
of the hardware resources supporting the system. This work includes a complete 
set of tests where the results of the fuzzy agents are compared with the optimum 
values reached by each multipoint videoconference: the proposed fuzzy archi- 
tecture provides very good dynamic control of the videoconference qualities; 
moreover, its use will be particularly interesting in mobile environments, where 
the devices are heterogeneous and present limited processing capabilities. 



1 Introduction 

Multimedia streaming applications are becoming increasingly popular on the Internet; 
however, the users of these streaming services often find their quality to be insuffi- 
cient. To improve this situation, there is currently an increasing amount of research on 
the different types of congestion control on the networks [1], These controls can be 
implemented using different artificial intelligence mechanisms, such as fuzzy logic [2, 
3] or agent systems [4], and it is common to combine these mechanisms [5 J. 

Videoconferencing services are becoming more available on the Internet and their 
uses cover a large range of areas [6, 7]. Videoconferencing specifically requires a set 
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of resources that are not always available over time [8], and this situation can lead to 
the above-mentioned congestion, which needs to be controlled [9]. 

The quality control of video transmissions allows different alternatives to tradi- 
tional networking approaches. One of these alternatives is to operate on the applica- 
tion layer, usually controlling the video coders [10, 11]; this way, we can aim to 
transmit slow video signals over the public network in real-time. 

If a single videoconference communication can cause important congestion prob- 
lems, then multipoint videoconferencing requires careful control of the resources in 
general and the bandwidth in particular [12]. This is a situation where it is advisable to 
act on the application layer to minimize congestion situations on the network layer. 
This paper focuses on this subject. 

Rate control mechanisms are currently being used in multimedia streaming [13, 
14], and some of them act on the application layer and some others introduce adapta- 
tion methods for UDP traffic. 

This publication describes the design of a fuzzy-based agent controller which dy- 
namically changes the JPEG quality of its associated videoconferencing. The paper 
does not describe the implementation details: the system has been implemented using 
the Sun Microsystems Java Media Framework API, which provides streaming, captur- 
ing, processing and rendering facilities, as well as suitable RTP access. 

The paper starts with the architecture of the system and continues with its design; 
the architecture section focuses on the agent characteristics of the controllers: each 
controller acts on a single videoconference and implicitly shares the frame ratio varia- 
tions with the rest of the agents. 

The design section includes a study of the main factors of the application level that 
affect the videoconferences and the advisability of incorporating them in the fuzzy 
controller. The kernel of the design section presents the linguistic variables and the 
fuzzy rules. 

The publication includes a complete section of results: real test results and simu- 
lated test results. The simulated results make it possible to compare the quality of 
multipoint videoconferencing in two situations: the optimum situation and the situa- 
tion reached using the fuzzy controllers. 



2 Fuzzy Agent-Based Videoconferencing System 

The fuzzy agent we propose can be applied to any videoconferencing system where 
there are any resources shared by two or more videoconferences. Usually, the most 
critical shared resource will be the communications network, but it is possible to share 
different resources, such as computers acting as senders or receivers of multiple simul- 
taneous videoconferences (Figure 1). 

Each fuzzy agent acts on a different videoconferencing process; therefore, a sys- 
tem supporting 'n ’ simultaneous videoconferences will form a fuzzy multi-agent con- 
sisting of ‘n ’ fuzzy agents. Periodically, each fuzzy agent receives its corresponding 
videoconference frame ratio, and dynamically it decides the JPEG quality that should 
be applied to improve the video quality (in order to maximize the frame ratio and 
JPEG quality). 
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Figure 1 shows the system architecture; senders use a communications network to 
deliver video frames using RTP, while receivers take the video streams and communi- 
cate the frame ratio to their corresponding agents. The fuzzy agents use the frame ratio 
to dynamically determine the JPEG quality it will be necessary to apply. 




Fig. 1 . Fuzzy multi-agent videoconferencing architecture 

The fuzzy multi-agent videoconferencing system created can work using limited 
hardware resources and it has been tested for controlling several simultaneous video- 
conferences running on different computers. This means that it is possible to use each 
single computer to support several agents and senders. In this case, some frame ratios 
can be slowed down due to the excess load applied to the computers, and the fuzzy 
agents will determine the new JPEG qualities accordingly. The agents aim to maxi- 
mize the videoconference qualities independently of the resources (network, com- 
puters, coders, effects, etc.) that produce the congestion. 

We must realize that the different agents of the multi-agent system implicitly share 
some important information: the frame ratio variations. When there is any type of 
bottleneck in the system, the frame ratio of the videoconferences involved in the bot- 
tleneck drops, and, therefore, the fuzzy agents will try to compensate for this, probably 
by reducing the JPEG quality of their videoconferences. This action contributes to 
reducing the consequences of the bottleneck and to equilibrating the quality of the 
different videoconferences. 

To explain the JPEG quality/frame ratio relationship, we carried out a test using 
four different videoconferences (160x120 to 640x480 resolutions) on a specific hard- 
ware configuration (AMD 2.4 GHz, Logitech camera, Sun Microsystems coder). Fig- 
ure 2 shows the results. As you can see, low JPEG qualities determine high frame 
ratios, and high JPEG qualities (specially over 0.8) determine low frame ratios. This is 
due to the compressed frames achieved using low JPEG qualities. 
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Fig. 2. JPEG quality (x-axis) versus frame ratio (y-axis) using different resolutions (z-axis) 

It would be possible to design more sophisticated architecture, where the fuzzy- 
agents could be parameterized in order to take advantage of the specific configura- 
tions (number of senders and receivers on a computer, CPU power, bandwidth of the 
network, device and coder used, resolution of each videoconference, etc.) and each 
specific process’ distribution, but the result would be multi-agent architecture that is 
difficult to tune to each specific configuration. Furthermore, ‘harmonizing’ these 
physical parameters would require some type of centralization or complicated distrib- 
uted communications. 

The solution we propose in this paper provides very efficient results based on a 
simple design of the fuzzy multi-agents; this simple design comes from the fact that 
several videoconferences sharing common resources will slow their frame ratios in 
any bottleneck situation. Consequently, their corresponding fuzzy-agents will inde- 
pendently reduce the JPEG quality (and therefore the overall bit-rate) as if it were 
acting in a more complicated distributed way. 

3 Fuzzy Agent Design 

It is necessary to study the impact of the different factors that affect the videoconfer- 
encing quality in order to design a suitable fuzzy agent that works properly on the 
possible different situations and environments. This will be done in the next section 
(3.1). When the above-mentioned factors have been considered, we can study different 
design possibilities, discarding or including some of these factors in the final designed 
approach. The fuzzy solution requires the selected factors to be converted into linguis- 
tic variables and fuzzy rules to be established that will act on the linguistic variables in 
order to provide satisfactory results (section 3.3). 

3.1 Preliminary Considerations 

Since we have determined that the frame ratio will feed the fuzzy agents, it would be 
advisable to study the different factors that can affect it (and that can be controlled by 
the fuzzy agents). Of course, we know the impact of the JPEG quality (Figure 2), but 
there are some other factors: 
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Coders: Each implementation of a coder can be more or less efficient; and more im- 
portantly, the actual nature of the coder determines a video compression capability 
and a bit-rate and frame-rate result. 

Effects: Each frame of a videoconference can be processed to modify the video data, 
often to create effects, such as border enhancement, filters, etc. Depending on the 
relationship between the CPU power, the effect complexity and the resolution, the 
frame rate can drop if the computer cannot manage the effect in real time. 

CPU Power, Resolution, Operating System, etc.: By carrying out an in-depth study 
of the factors that affect the frame ratio of one video transmission, we will find a com- 
bination of the basic factors we have outlined. 

3.2 Proposed Agent Parameters and Technology 

From the previous section, we can determine that it is possible to improve the video- 
conferencing frame rates by acting on different factors; JPEG quality, video resolu- 
tion, selected coder, CPU power, operating system, etc. Some of the factors, such as 
the CPU power or the operating system, can clearly only be applied before starting 
each videoconference, since any attempt to dynamically change the factor would inter- 
rupt the transmission. 

The video resolution and the selected coder could be changed dynamically, but this 
action would interrupt the current RTP video streaming, forcing a new one to be 
started. This would produce a long interruption that cannot be tolerated in the short 
periods the fuzzy agents should be acting to control the overall video quality. 

Acting on the processing load of the effects is an interesting possibility for specific 
videoconferencing systems where effects are present and they can be parameterized. 
For example, we could dynamically change the level of an algorithm that enhances 
fuzzy images depending on the frame rate that the computer running the effect is able 
to obtain (this computer will usually be simultaneously running other time-variant load 
processes such as capturers and coders). This paper will avoid controlling any specific 
effect in order to offer a general solution to the videoconferencing dynamic quality 
control issue. 

We select the JPG quality factor as the single parameter we will change to balance 
and enhance the overall videoconferencing quality. Figure 2 shows its importance in 
achieving this goal. 

From Figure 2, we can see that by increasing the JPEG quality factor (JPG) we 
could decrease the frame ratio (FPS) and vice versa; this would lead us to work not 
only with the absolute frame ratio and JPEG quality factors, but also with their corre- 
sponding differential ones; 

JPG t+1 = ;/ (A FPS„ A JPG,), where A FPS, = FPS.-FPS & A JPG, = JPG,- JPG 

Since ‘rf is not linear and it is not easy to determine, we must use an empirical 
method to implement it or a more sophisticated artificial intelligence approach. The 
predicated based logic could help to establish the behaviour of ‘q’, due to the natural 
way of expressing general rules, such as: 

ifAFPS, is negative AND JPG, is high then JPG,+ , is medium 
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Finally, we considered the fuzzy logic mathematical tool to be the most appropri- 
ated, due to the possibility to implement fuzzy behaviours and the difficulty to estab- 
lish specific values for the limits of the predicates. 

3.3 Fuzzy Agent Details 

By adopting the decisions we made in the design sections, we are able to obtain an 
extremely simple and efficient fuzzy controller. Its linguistic variables are: FPSt, 
AFPS, JPGt, AJPG, and JPG t+1 . 
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AFPS, has two categories: negative (from -15 to 0) and positive (from 0 to 15). 
Similarly, AJPG, also has negative (from -1 to 0) and positive (from 0 to 1) categories. 

The fuzzy rules are divided into two main groups: static control rules and dynamic 
control rules. The first group looks after the behavioural changes that can be estab- 
lished by responding to the current state of the system, without looking at the recent 
changes the videoconferencing has experimented. That is to say, the rules with the 
JPG t+ i = i] (FPS„ JPG,) pattern. The dynamic control rules respond to the parameter 
changes: JPG, +/ = ij (AFPS,, AJPG,, JPG,), reacting to the recent changes experi- 
mented in the videoconference parameters: 



Table 1 . Fuzzy rules 



Static control rules 

if FPSt is low and JPGt is high then JPGtpl is medium 
if FPSt is low and JPGt is not high then JPGtpl is low 
if FPSt is medium and JPGt is high then JPGtpl is medium 
if FPSt is medium and JPGt is not high then JPGtpl is low 
if FPSt is high and JPGt is high then JPGtpl is high 
if FPSt is high and JPGt is medium then JPGtpl is medium 
if FPSt is high and JPGt is low then JPGtpl is low 
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Table 1 . ( continued ) 



Dynamic control rules 

Entering and exiting videoconferencing cases 

if 1_FPS is positive and I_JPG is positive and JPGt is low then JPGtpl is medium 
if I_FPS is positive and I_JPG is positive and JPGt is not low then JPGtpl is high 
if 1_FPS is negative and I_JPG is negative and JPGt is not high then JPGtpl is low 
ifl_FPS is negative and I_JPG is negative and JPGt is high then JPGtpl is medium 

stabilised cases rules 

if I_FPS is positive and I_JPG is negative and JPGt is high then JPGtpl is medium 
if I_FPS is positive and I_JPG is negative and JPGt is not high then JPGtpl is low 
if I_FPS is negative and I_JPG is positive and JPGt is not high then JPGtpl is low 
if I_FPS is negative and I_JPG is positive and JPGt is high then JPGtpl is medium 



4 Results 

We have run a set of real multipoint videoconferencing processes to test the fuzzy 
controllers. Figure 3 (left) shows the average frame ratio reached for two simultaneous 
videoconferences using different JPEG qualities: (x & z axes). Figure 3 (right) shows 
the videoconferencing qualities ( Q ) obtained, where Q, = 1/25*FPS,+ JPG/5. FPSt 
varies from 0 to 15 and JPGt varies from 0 to 1. The FPS parameter has 3 times more 
importance than the JPEG one. Q varies from 0 to 0.8. 

The arrows inside Figure 3 (right) represent the 3 iterations the fuzzy controller 
executes to reach its maximum quality result. 



Fig. 3. Frame ratio (left) and quality (right) applying different JPEG qualities (x & z axes) 

In order to test the fuzzy agent’s behaviour exhaustively, we have used a net- 
work/system simulator configured to determine the different frame ratios obtained by 
applying 3 simultaneous videoconferences. We must provide the simulator with the 
following parameters: bandwidth of the network, resolutions and JPEG qualities of the 
3 videoconferences and the MHz of the 3 computers we use to send the RTP video 
streams. The CPU powers have been set to 2000, 2400 and 3000 for the test we will 
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explain in this section. The maximum frame ratio values have been set to 15; this can 
be considered as a good balance for multipoint videoconferencing systems. 

Figure 4 (left) shows the frame ratios calculated for the simulator, providing dif- 
ferent Kbps bandwidths (x-axis) and using three videoconferences of 160x120 (low), 
320x240 (medium) and 640x480 (high). Figure 4 (right) shows the frame ratio ob- 
tained by varying the JPG quality (x-axis) and the bandwidth (z-axis); resolution 
320x240. 
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low medium 'high 




Fig. 4. Frame ratios generated for the system simulator 



Using the simulator we have designed a first test where we calculate a good ap- 
proximation of the average maximum video quality {Q, = 1/25*FPS,+ JPG/5, Q 
e(0..0.8 )} reached by applying 3 different resolutions (three 160x120, three 320x240 
and three 640x480 videoconferences). For this purpose, for each bandwidth value, the 
simulation processes the ll 3 combinations it is possible to obtain by applying JPEG 
qualities 0 to 1 in steps of 0.1. We will call the results “Simula_Max”. Next, we run 
the fuzzy agent for each bandwidth to calculate the quality it is able to obtain. Our aim 
is to get all the results close enough to the calculated maximums (the optimum values). 
This would mean that our fuzzy agent is dynamically choosing the adequate JPEG 
qualities; we will call the results obtained “Fuzzy_Average”. Figure 5 shows the test 
results, as you can see, the fuzzy agent presented works well. 
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Fig. 5. Most favourable qualities and their corresponding fuzzy agent results (x-axis: band- 
width, y-axis: video quality Q e[0..0.8 ] ) 



The last test included in this work (Figure 6) calculates “Simula_Max” and 
“Fuzzy_Average” for all the cases obtained by combining seven different sets of 
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videoconference resolutions and the 33 bandwidth values ranging from 300 Kbps to 
20000 Kbps. Figure 6 (left) shows the “Simula_Max” optimum values; Figure 6 
(right) shows the fuzzy agent results “Fuzzy_Average”. It can be seen that the fuzzy 
agent provides a very good approximation to the most favourable results. The flat 
area on the right figure can be improved by increasing the number of categories in 
the JPG’s linguistic variables. 




Fig. 6. Most favourable qualities (left) and their corresponding fuzzy agent results (right), x- 
axis: bandwidth, z- axis: resolution (1 means three 160x120 simultaneous videoconferences & 
7 means three 640x480 simultaneous videoconferences) 



5 Conclusions 

It is possible to significantly increase the overall quality of multipoint videoconfer- 
encing by dynamically acting on the JPEG quality parameter of each individual 
videoconference. To do this, we can assign a fuzzy agent controller to each single 
videoconference. Since the fuzzy logic of the agents is based on the frame ratios 
(and JPEG qualities) the videoconference qualities will dynamically equilibrate 
when a bottleneck is reached in any of the hardware resources supporting the sys- 
tem. 

The fuzzy agents are simple and efficient and they do not need to init or tune 
hardware-dependent parameters. The test results show that the JPEG qualities ob- 
tained by the fuzzy rules are very close to the optimum values, and these results can 
even be improved by incorporating new categories into the JPEG linguistic vari- 
ables. 

The proposed architecture runs on the application layer, and therefore, it is 
compatible with the typical networking congestion level algorithms; furthermore, its 
use will be particularly interesting in mobile environments, where the devices are 
heterogeneous and present limited processing capabilities. This idea will be the 
focus of our future work, combined with the dynamic control of real-time specific 
parameterized effects and their impact according to the CPU power of the multi- 
point videoconferencing senders. 
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Abstract. Current agent architectures provided by MAS platforms impose some 
limitation that affect the development of the functionality of software agents 
from scratch, placing little emphasis on (re)configuration and (re)use. This 
paper presents a software agent architecture development approach using a 
component and aspect-based architecture that promotes building agents from 
reusable software components and the configuration of some software agents. 
The basis of our architecture is the use of component-based and aspect-based 
software development concepts to separate agent functionality into independent 
entities increasing extensibility, maintainability and adaptability of the agent to 
new environments and demands. The architecture simplifies the software agent 
development process, which can be reduced to the description of the agents' 
constituent components and supported agent interaction protocols using XML 
documents. In addition, the extensibility provided by the component orientation 
enables to extend and reconfigure the internal agent architecture to accomplish 
additional agent capabilities such as planning. 



1 Introduction 

To accomplish a massive use of the agent technology depends, among others, on 
improving the developing and the deploying of multi-agent systems using existing 
platforms. Constructing software agents using any of the most accepted agent 
platforms can be a complex and error prone task that requires from the developer 
some skills in a concrete programming language, usually an object-oriented one. As a 
matter of fact, developers must became experts in using an API, provided by a 
platform vendor, so normally they are not interested in spending more time in 
learning how to use other agent platforms. Our aim is to facilitate developer’s work, 
reducing the programming task and promoting the (re)use of agents inside multiple 
platforms. 

Current agent architectures are mainly implemented as object-oriented 
frameworks [1,2,3] that provide a collection of extensible classes modeling typical 
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agent concepts. Component-Based Software Engineering (CBSE)[4] is a product of 
the natural evolution of object-orientation, endowing object-oriented languages with 
assembly capabilities beyond inheritance. While objects are expressed on the 
language level, components are expressed principally by exploring their public 
interface and promoting black box reuse [5]. We propose a component-based 
architecture for developing software agents that decompose agent functionality into 
independent components, i.e. in-house or COTS (Commercial Off-The-Shelf) 
components, or Web services [7], 

Several benefits derive from developing software agents using a component-based 
architecture. For instance, it enables the construction of software agents mainly by 
composing the agent from reusable COTS components, reducing the development 
times, costs and efforts. Using highly tested COTS components would be a great 
accomplishment for agents, since it can save hours of development and the resulting 
agents should work better. Moreover, resulting component-based agents are more 
adaptive and flexible, as the agent functionality is coded independently from other 
agent specific classes. 

However, decomposing a system, and in particular agents, in context independent 
components is not a trivial task. Commonly, the same concern happens to be spread 
over different components creating undesirable dependencies among them. Applying 
the separation of concerns principle, these concerns should be identified and separated 
to cope with complexity. According to what AOSD (Aspect Oriented Software 
Development) proposes [6] these concerns are modelled as first order software 
entities. Our agent architecture separates functionality from coordination and 
distribution aspects modelling them as different and decoupled entities. In this way 
the agent functionality is provided by reusable software components, which offer 
agent core services as well as application-dependent functionality, and are inserted in 
the agent as plug-ins. The composition between agent internal components is 
performed at runtime, allowing the reconfiguration and adaptation of agent behaviour 
to support new interaction protocols and functionality. 

The coordination issue is separated in an independent entity, decoupling agent 
functionality from the agent interaction it is involved. Likewise, platform-dependent 
features, such as the distribution of messages using the FIPA-compliant message 
transport service is performed by a distribution aspect, and the encoding of messages 
in a concrete FIPA representation is also enclosed in a different entity. Since platform 
dependencies are encapsulated as external plug-ins, our agents can be adapted to 
engage in any platform. Therefore, instead of defining just another agent platform we 
propose: “write your agent once, run it on any FIPA-compliant platform’’. 

Another important contribution of our approach is that an agent can be 
“programmed” simply by editing XML documents describing agent constituent 
components. The developer only has to provide an explicit description of the 
supported interaction protocols, a description of the components that will be 
assembled inside the agent architecture, and other information related with MAS 
deployment. 

The structure of this paper is organized as follows: Section 2 presents our 
component-based agent architecture, describing the main entities of our model. In 
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section 3 we illustrate through an example how to extend the agent architecture to 
incorporate a planner that will be used to plan agent actions, and how to program an 
agent using this feature. Finally some concluding remarks are given in section 4. 

2 The Component and Aspect-Based Agent Architecture 

The design of any agent architecture can be afforded from a software engineering 
perspective viewing an agent as a complex piece of software that is involved in 
multiple concurrent tasks and requires interactions with other agents. Our architecture 
models key agent concepts such as functionality, coordination, and agent 
communication representation and distribution as separated entities (Fig. 1). 
Currently, this architecture is implemented in Java. 




Fig. 1 . Component and Aspect-Based Architecture for Software Agents 



Agent Interface 

The AgentDescription component contains the agent’s public interface, which is an 
extension of the traditional software component’s public interface adapted to software 
agents. In our case, the agent offers a public interface that includes a description of 
the agent’s functionality, a list of communication protocols supported by the agent, 
the supported ACL formats of incoming and outgoing messages, and a list of agent 
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platform names where the agent may engage. To unify the description of software 
components public interfaces avoiding the use of proprietary interface description 
languages (IDLs), we propose the use of OWL-S (before known as DAML-S). OWL- 
S consists of a set of ontologies that provide a vocabulary to describe services. The 
OWL-S profile and model descriptions provide enough information to enable 
automated execution based on well-defined descriptions of a service’s inputs and 
outputs. The use of ontologies allows sharing a semantic agreement about the services 
provided by and included on an agent, and not just a syntactic one as traditional IDLs. 

Functionality 

In our architecture components labelled as «FunctionalComponent» (see Fig. 1) 
encapsulate data and functionality, and are always present in the architecture 
providing the general-purpose agent functionality. The basic functionality that every 
agent possesses, such as locally store and access shared data and the ability of 
constructing and sending a message, are provided in KnowledgeBase and 
BasicAgentActions components. An additional feature, but considered as basic in our 
architecture, to train the agent in a new protocol [8] is supported by the TpfNActions 
component. On the other hand, application and domain specific functionality can be 
provided by any software component, i.e. a COTS component or a Web service [7], 
by means of its offered services, which are accessed by public interfaces [5], While in 
current agent architectures the agent behaviour is encapsulated in units, such as tasks 
or actions, and its implementation must adhere to a set of inheritances and interfaces 
realizations using a traditional object-oriented API, our model design does not impose 
any restrictions to software components added to the agent. These components are 
only required to describe their provided interface in OWL-S. These components may 
be plugged into the agent on demand, and are easily changeable over the lifetime of 
the agent, guaranteeing software agent evolution. 

Coordination 

In our architecture, an independent entity denoted as «Connector» controls the 
conversations in curse in which the agent is involved, and coordinates the execution 
of agent functionality that is invoked along a conversation. Otherwise we do not 
propose a centralized coordination engine to control agent interactions, but rather an 
independent connector deals with each agent interaction (see Protocol Connector in 
Fig. 1). Each connector coordinates a dialogue according to a specific protocol (e.g. 
English Auction Protocol), which is not hard coded inside. The coordination protocol 
is described in XML and it is given to the connector at instantiation. We provide a 
XML Schema that must be followed by any specific protocol description in order to 
be properly processed by a connector. The protocol description does not only contain 
the rules to coordinate a conversation, but also it describes a detail plan for 
conducting a role that states the set of agent actions that are executed. These agent 
actions might refer to any service included in the agent interface and provided by a 
software component. 
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The description of an interaction protocol consists of the description of the 
interchanged messages and the description of the roles participating in the interaction. 
A separate finite state machine, represented by a set of state transition rules, depicts 
each role. Each transition description includes also the actions that the agent carries 
out when receiving a message or an internal event in the context of the interaction, but 
expressed in an implementation independent way by means of OWL-S. 

At runtime, an agent is able to encompass a new protocol only by uploading the 
corresponding XML description through the TPfN (Training Protocol for Negotiation) 
protocol [8] defined for that purpose. Hence we provide a single connector 
implementation that can cope with the coordination of any protocol described 
following a XML schema that defines the information required to describe any 
protocol. This implementation is described in more detail in [13]. 

Dynamic Composition 

The «Mediator» component AgentCompositionalCore (ACC) (showed in Fig. 1) 
performs the dynamic composition of components and connectors enabling loose 
coupling between them. The mediator maintains relevant information about 
components that provide agent functionality and active conversations controlled by 
connectors. This information is represented in the architecture by the object 
AgentContext (see Fig. 1), which holds references to all components and connectors 
instances, and uses the descriptions contained in the AgentDescription component to 
create connectors and components. The conversation identifier names active 
connectors ( Connectorlnstance ), and every component ( Componentlnstcince ) is 
identified within the agent architecture with a kind of role name. Since the role a 
component plays inside an agent is determined by the ontology it commits, we use the 
ontology name for identifying components. Componentlnstance class has been 
subclassed to consider different types and features of plugged software components 
(Java Bean, Web Service, etc.). 

Since components and connectors have no direct references among them, their 
composition is performed at runtime by the ACC, handling connector requests of 
performing agent functionality. Dynamic composition allows replacing a component 
instance by another implementation, upgrading the agent without substituting it. 
Moreover the agent configuration could change to meet a certain requirement such as 
efficiency. For example, an agent can test different Web services providers offering 
the same service, and use the one with the current best response time. In addition, this 
component is in charge of dispatching incoming messages and internal events to 
connectors. The conversation identifier of an incoming message is checked to 
determine if the message belongs to an active conversation or a new conversation. 

Message Representation 

The AgentACLRepresentation component allows the agent to communicate using 
different ACL representations. This component hides representation dependencies 
defining a high-level interface interface ACLParser (shown in Fig. 1) to send 
and receive messages in different ACL representations. A different parser supports 
the parsing and encoding of a message in a concrete ACL representation such as 
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XML, String or BitEfficient encoding formats following FIPA specifications. This 
component does not care about the source agent platform of the message, as ACL 
encoding is independent of the delivering platform. 

Message Distribution 

By separating the distribution of messages from the functionality and coordination it 
is possible to use different agent platform services for message delivery. In the 
architecture the distribution aspect hides platform dependencies defining a high-level 
interface to send and receive messages to and from different agent platforms. This 
separation allows adapting the agent to use the services of different agent platforms. It 
avoids to compromise agent development to platform dependencies, and reduces the 
management of communication [12]. In the architecture, the distribution aspect is 
encapsulated in the Distribution component (showed in Fig. 1). For every agent 
platform, this component instantiates a specialized component realizing the interface 
MTSAdapter , which encapsulates the corresponding platform-dependent functionality 
allowing the agent to engage different agent platforms by providing access to the 
corresponding Message Transport Service (MTS). Separating platform specific code 
in different components reduces the complexity of managing communication with 
multiple agent platforms. Since agent bootstrapping and MTS access can differ, every 
MTSAdapter implements platform specific mechanisms needed to send and receive 
legal messages from a concrete agent platform and use other platform services. The 
Distribution component is able to determine the appropriate adapter by using a 
distribution table with pairs (AID, platform identifier). Our implementation currently 
supports adaptors for FIPA-OS, JADE, and Zeus agent platforms. 

3 The Development of a Planning Software Agent 

This section explains how to build a planning agent using our approach. We have 
chosen this example to show that our agent architecture can be easily extended to 
cope with new agent capabilities, such as planning. In addition, this example will 
show how our architecture can afford the development of agents that do not include a 
fixed set of predefined plans defining their behavior. Instead, an inner planner 
generates the plan of actions of the agent at runtime. Currently, just a few FIPA- 
compliant general-purpose agent architectures, such as Zeus, allow explicitly the 
definition and use of plans to define agent behavior. In other well-known FIPA- 
compliant agent architectures, where agent behavior is modularized and separated into 
tasks, is quite difficult to afford the addition of new functionality that provides the 
agent with planning capabilities. We mean to the capability of generating a plan, that 
is, a sequence of agent actions. Actually, Jade agent architecture includes a scheduler, 
not a planner. Regarding FIPA-OS agent architecture, it defines a Planner Scheduler 
component that is not currently available. In our case, to extend our agent architecture 
with planning capabilities only entails the adding of an internal planner component, in 
charge of creating plans of actions that could achieve agent’s goals. 

The agent that will illustrate this example is a planning agent for the block world 
domain, a well-known domain in the planning research community. This agent owns 
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a set of blocks, and its goal is to achieve a certain configuration of blocks. For this 
purpose, the agent is able of moving its blocks, but also it can request blocks to other 
agents in order to achieve the required configuration. In our model, agent 
functionality, such as moving blocks, is provided by services offered by plugged 
components and the interaction with other agents is controlled by means of FIPA- 
compliant protocols. 

To develop and deploy an agent using our approach, we provide an XML 
deployment document (as the one depicted in Fig. 2) that the developer should fulfil 
with the initial configuration and properties of the agent. This information allows 
configuring the agent architecture. The description of the planning agent of our 
example will consist of a component that provides a service for generating plans, and 
another component that implements the typical actions of the block world domain, 
offered as well as services of its provided interface. These components that are 
initially plugged into the agent architecture are packaged into the Functionality 
element (see Fig. 2). Within this element, and for each plugged component, the 
element Component encloses information about its provided interface and its 
implementation. The component interface describes the set of offered services in an 
XML document in OWL-S, pointed in the InterfaceDescription element. The 
Deploymentlnfo element points to a XML document containing information about the 
component implementation regarding the kind of implementation (e.g. Java, CORBA, 
Web service), how to locate and deploy the component, etc. As detailed in Fig. 2, the 
first component element includes the information bound to the BWPIanner 
component, that is, the XML document that describes, in OWL-S, the provided 
interface of the planner component, and the information for deploying the component. 
Likewise, the second Component element (see Fig. 2) details similar information 
relating to the second component, namely BlockWorld. 

Within the Coordination element (see Fig. 2), the designer must specify the 
coordination protocols supported by the planning agent. As stated in the previous 
section, the agent coordination and the communication between agent inner 
components is decoupled from agent functionality and described in XML documents; 
therefore, this section of the agent description includes references to the XML 
protocol description documents. Each XML protocol description is referenced in the 
href attribute of a ProtocolDescription element. The coordination of our planning 
agent is given by three coordination protocols. The first ProtocolDescription element 
in Fig. 2 refers to BWPlanning.xml, which describes the coordination among the 
BWPIanner component, the BlockWorld component and the BasicAgentActions 
component (shown in Fig. 1). This protocol governs the initiation, negotiation and 
execution of a plan. In addition, the planning agent supports two different 
coordination protocols that involve interaction with other agents. The second 
ProtocolDescription element in Fig. 2 points to the description of the FIPA Contract- 
Net protocol for contracting the block supplier. The third ProtocolDescription 
element in Fig. 2 points to the document containing the XML description of the FIPA 
Request protocol for requesting a block from other agent. 
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<?xml version="l . 0 “ encoding="UTF-8"?> 

<AgentDescription xmlns :xsi=“http: / /www.w3 . org/2001/XMLSchema-instance" 
xsi :noNamespaceSchemaLocation="C : \map\tesis\xml\AgentDescriptionSchema .xsd"> 

<Behaviour> 

<Functionality> 

<Component> 

<Interf aceDescription href="http: //map: 8081/idl/BWPlanner.owl" notation="OWL-S" /> 
<DeploymentInfo href = "http : / /map : 8081/idl/PlannerDeployment .xml " notation=" String" /> 

< / Component > 

<Component> 

<Int erf aceDescription href =" http: //map : 8081/idl/BlockWorld.owl " notation= "OWL-S" /> 
<DeploymentInfo href = “http : / /map : 8081/idl/BlockWorldDeployment .xml " notation=" String" /> 

< / Component > 

< / Func t ional i ty> 

<Coordination> 

<ProtocolDescription href="http: //map: 8081/protocol/BWPlanning.xml" /> 

<ProtocolDescription href =" http: //map : 8 081 /protocol /BWFIPAContract_Net .xml"/ > 
<ProtocolDescript ion href = ” ht tp : / /map : 8 0 8 1 /protocol / BWFI PAReques t . xml " / > 

< /Coordination 
< / Behavi our > 

<Distribution> 

<AgentPlaftorm adaptor="http: //map : 80 81 /adapters /f ipaosAdapter . class" platformName="FIPA-OS" /> 
< /Distribution 

<ACLRepresentation ACLParser= "http : / /www. altova.com/ACLString . class" f ormat= " String" /> 
<KnowledgeBaseContent> 

+<AcquaintanceDat abase resource= "BlockSuppliers "> 

< /KnowledgeBaseContent> 

<ActiveContext> 

<StartProtocol protocolID= "BWPlanning" ontologylD = "BlockWorld" role= "planning "> 

<haslnput> 

+<Input named=" suppliers "> 

</hasInput> 

<haslnput> 

+<Input named= " goal " > 

</hasInput> 

</StartProtocol> 

</ActiveContext> 

< / AgentDescription> 



Fig. 2. XML deployment document of a planning agent 



Within the Distribution element, the developer details the agent platform adaptors, 
including a reference to its implementation. The AgentPlcitform element in Fig. 2 
includes the adaptor for the FIPA-OS platform. The ACLParser plug-in formats ACL 
messages in the String format. This information is given inside the 
ACLRepresentation element in Fig. 2. 

The description of the planning agent also includes, enclosed in the 
KnowledgeBaseContent element, the initial content of the agent Knowledge Base, 
expressed in terms of beliefs, goals, conditions, and acquaintance databases, that 
defines a set of the identifiers of the agents with which the agent will interact. As 
detailed in Fig. 2, the knowledge base component of the planning agent will contain 
an initial list of block supplier agents, defined as an acquaintance database named 
BlockSuppliers. Finally, within the ActiveContext element, the developer specifies the 
initial behaviour the agent will execute, expressed in OWL-S. The ActiveContext 
element of Fig. 2 encloses the start of the planning protocol. 

The Planner Component 

Since our agent architecture does not provide a specific component for planning, we 
need to implement a new component providing this capability. For this purpose we 
developed a software component wrapping a legacy planner. This planner, which 
follows the SNLP Algorithm, is implemented in Common Lisp, and tries to generate a 
plan to achieve a goal given an initial state. The planner uses an extended block world 
domain description that includes three predicates and one additional action. The 
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predicates ( have x ) and ( not-have x ) state, respectively, that there is or there is not a 
block x on the table. The predicate ( available x) states the there is a block x available 
in the universe. The action ( demand x) procure a block x. This action allows 
introducing communication with other agent in order to acquire new blocks. 

A wrapper component, named BWPIanner, offers a service for generating a plan, 
accepting as inputs the initial state and the goal state. As stated before, the provided 
interface of the wrapper component is described in OWL-S in BWPIanner. owl. This 
interface comprises a single service, named GeneratePlan , whose OWL-S description 
as an OWL-S atomic process is depicted in Fig. 3. The OWL-S description includes a 
description of the inputs required by the service (named initialState and goalState in 
Fig. 3), and the outputs generated (named plan and therelsAPlan in Fig. 3), all of 
them defined as properties of the process GeneratePlan. 

We want to point out that, since the planning capabilities are encapsulated in a 
software component, it is possible to use a different planner in the architecture 
providing another implementation, maintaining the same provided interface or 
extending it including a new service to generate more than one plan. For this purpose, 
the planner can be adapted to provide more than one plan, in one step or in successive 
requests. 



<process : AtomicProcess rdf : ID= "GeneratePlan" /> 

<process : input rdf : ID= "initialState "> 

<rdf s : domain rdf : resource=” #GeneratePlan" /> 

crdfs :range rdf :resource="http: / /www.w3 . org/2001/XMLSchema#string" /> 
< /process : input> 

+<process : input rdf : ID= " goalState " > 

<process : output rdf : ID= "plan" > 

<rdf s : domain rdf : resource=" #Plan" /> 

<rdf s : range rdf : resource= " #Ac tionLis t " /> 

< / process : ou tput > 

<process : output rdf : ID= " therelsAPlan" > 

<rdf s : domain rdf : resource=" #Plan" /> 

<rdf s : range rdf : resource= " #Ac tionList " /> 

</rdf : Proper ty> 



Fig. 3. OWL-S description of the GeneratePlan service provided by the planner component 



The Planning Protocol 

In our agent model, the invocation of the agent functionality, provided as services by 
plugged components, is described in the transitions of the protocol description. In the 
case of our planning agent, the transitions description included in the BWPlanning 
protocol describes how to coordinate the invocation of the services provided by the 
BWPIanner component, the BlockWorld component and the BasicAgentActions 
internal component to generate a plan, negotiate contracts with block suppliers and 
execute every action of the plan. 

Every TransitionDescription element encloses references to agent actions, such as 
generating a plan, moving blocks, and starting a new conversation with other agent 
requesting a block, that are invoked during the planning protocol execution. Within 
every transition description, and using control constructors defined in OWL-S, such 
as Sequence, or If-Then-Else, we can specify how to coordinate the execution of 
agent functionality. The Fig. 4 depicts the transition description that is executed at the 
beginning of the planning protocol. The Sequence control constructor encloses a list 
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of processes to be invoked in a sequential order. As shown in Fig. 4, this transition 
consists in invoking the GeneratePlan service provided by the planner component to 
generate a plan. One of the components of the Sequence is an If-Then-Else control 
constructor, which allows putting a condition to the plan execution. If the output 
therelsAPlan of the GeneratedPlan service (as detail in Fig. 3, the service output 
therelsAPlan determines if the planner has found a plan), then the transition identified 
as ContractSuppliers is executed in order to start the block negotiation with the 
supplier agents. 



<Sequence> 

<components> 

<DoAction ID= "GeneratePlan" ontologyID="BWPlanner"> 
chaslnput resource= " ini tialState_input " /> 
chaslnput resource=" goals tate_input " /> 

<hasOutput> 

<UnConditionalOutput ID= "plan" /> 

</hasOutput> 

<hasOutput> 

<UnConditionalOutput ID= " therelsAPlan" /> 

< /hasOutput> 

</DoAction> 

<If-Then-Else> 

<if Condi tion> 

cIsTrue resource="thereIsAPlan"/> 

</ifCondition> 

<then> 

<ExecuteTransition IDref="ContractSuppliers" /> 

< / then> 

< / c omponen t s > 

</Sequence> 



Fig. 4. XML description of the initial transition of the BWPlanning.xml 



4 Conclusions 

The contribution of this work is a component and aspect-based agent architecture that 
combines component orientation and the separation of concerns principle. This 
proposal offers some benefits derived from the compositional approach: The (re)use 
of third-vendor components reduces programming errors during implementation, 
since they are supposed to be tested and error-free. Providing a COTS market, agents 
can be programmed simply by editing XML documents, reducing the time and effort 
required to develop agent-based applications. Since agent behavior is not hard-coded 
inside the agent, agent upgrade does not required to compile source code. Moreover, 
our agents benefit from agent platform independency due to the separation of the 
distribution concern applied, which lets them to live in most of the FIPA-compliant 
agent platforms. 

In addition, though constituent agent components are particularized by the 
configuration and deployment information, new components can be plugged in at 
runtime to address new requirements, and already registered components can be 
updated with new releases. We do not impose any methodological approach for the 
design of the agent. Instead, we propose to apply MDA mechanisms to derive each 
agent implementation in our agent model, from a design model provided by an agent- 
oriented methodology [11], 
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Abstract. This paper proposes a formal definition of a conceptual model 
for cooperation among multi-agent systems (MAS). This works consti- 
tutes a part of a general research project aiming at defining a generic 
interaction model that covers the different facets of the cooperative ac- 
tivity among MAS. We propose a framework for the specification, design 
and verification of interaction mechanisms. Doing so, and using the Z 
notation, we bring closer the concepts describing a cooperative activity 
while integrating them with the concepts related to the organizational 
modelling within a MAS. Moreover, we suggest a set of necessary proper- 
ties needed to maintain the consistency at the individual level (Agent) as 
well as at the collective level (System). The syntax and the semantic of 
proposed specifications have been validated with the Z-eves verification 
tool [1]. 



1 Introduction 

It is widely recognized that the specification phase in the design process of multi- 
agent systems is primordial, since it permits to master their complexity. Indeed, 
the use of formal specifications enables to avoid any ambiguity and imprecision 
and allows to reason rigorously about their properties and their behaviors. In 
general, the formal specification of multi-agent systems is an intricate task, due 
to the complexity of interaction mechanisms, the heterogeneity of system compo- 
nents and the lack of consensus about the fundamental concepts, such as agent, 
role, plan, objective, organizational structure, mental state, etc. In order to fa- 
cilitate formal specification of multi-agent systems, we need a formal framework 
that clarifies these concepts, unifies their representation and defines the relations 
between them. Many formal frameworks for agent systems have been proposed, 
such as the model proposed by Luck and d’lnverno [2] using Z notation and 
the BDI model [3] using modal logic. However, most of these models are limited 
to particular domains, like [4] for distributed problem resolution. However, we 
notes the quasi absence of standardization works that defines a generic inter- 
action model dedicated to all formal specification of a MAS regardless of the 
application domains. In the perspective to define a generic interaction model, 
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that integrates the individual aspect (agent) as well as the collective aspect (so- 
ciety), we focus, in this study, on the modelling of the cooperation mechanisms. 
The method we adopted to define our generic model is based on the study of 
several multi-agent architectures applied in different application domains. In a 
preparatory stage, we draw up a list of concepts as well as the purposes they are 
used in different architectures. Then, we retain the concepts which appear to be 
appropriate to the concerned domain. Thereafter, we assign to each concept a 
unique purpose. This permits to unify them. In a second main step, we inves- 
tigate the relations that exist between the retained concepts. This is based on 
their use in the different architectures. Finally, we make use of these architectures 
to formalize the cooperation mechanisms. The multi-agents systems are used in 
large application domains ; therefore it was necessary to restrict our survey to 
a limited range. In this work, we opted for some application domains relative 
to the category of systems formed by software entities that execute themselves 
in a virtual environment as : the multimedia applications domains ([5] and [6]), 
electronic commerce domains ([7] and [8]) and the Manufacturing systems ([9] 
and [10]). A study of other fields should be the subject of a future work to proof 
the generic aspect of the concepts which will be retained. 

The retained concepts following the developed study on they selected 
architectures will presented in section 2. This section consists in defining the 
concepts allowing to qualify the cooperative aspect in a MAS. In the last section, 
the established models are instantiated on a multi-agent architecture in order 
to ensure its consistency. The chosen architecture is the MASPAR (Multi- Agent 
System for Parsing Arabic) [11]. We conclude this paper by summarizing our 
ideas and by describing the future directions of our research. 

2 Formal Cooperation Model 

This section treats the different aspects of cooperation in a MAS and the links 
between them in two phases : the cooperation model and the cooperative agent 
model. The first phase consists in defining the concepts permitting to qualify the 
activity in a cooperative MAS, while the second describes the internal aspects 
of a cooperative agent as well as its basic properties. This models are formalised 
using Z notation [12]. 

2.1 Cooperation Model 

To define the cooperation model, we structure our presentation according to 
three levels : when can we talk about cooperation ? , how can this cooperation 
take place ? , how can we quantify the efficiency and the coherence of this 
cooperation ?. These points will be detailed in the following subsections. 

When Can We Talk About Cooperation? According to the study of var- 
ious multi-agent application domains, we noted that we can talk about cooper- 
ation if there is a society of agents that intend to reach a common objective. This 
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society requires a set of roles organized under an organizational structure that 
determines the responsibility of each agent. 

— A society of agents ” Society ” is composed of a finite set of agents, that 
collaborate to achieve a common objective, each one is defined as a simple 
agent or a subset of agents that inter-communicate and cooperate under an 
internal control. 

A mental state ” MentalState” regroups the whole of individual agent 
properties. These laters are classified in three subsets: the capabilities 
( Capability == P Propriety), the knowledges ( Knowledge == P Propriety) 
and the beliefs ( Belief == P Propriety). A simple agent is defined by the 
agent schema. 



MentalState 

capability : P : Capability 
knowledge : P Knowledge 
belief : P Belief 



agent 

MentalState 

belief U knowledge ^ {} 



Let Agent ::= A ((agent)) | An((F agent)). 



- Society 

agents : Pj^ Agent 



ff agents > 2 



— A common objective ” CommonObjective” is defined by a nonempty set 
of global goals [GlobalGoal] and a set of constraints [Constraint] related 
to the running modes of this objective. For abstraction reasons, we con- 
sider global goal as a set of local goals [ localgoal ], each one is described 
by a set of elementary tasks [Task]. Formally : LocalGoal == P 1 Task and 
GlobalGoal == P x LocalGoal. 

The global goals of a same common objective are related by a variable 
dependence ratio such as : a total dependence TD, a partial dependence PD 
or an independence ID. 

CommonObjective 

BG : P 1 GlobalGoal 
Cr : P Constraint 
TD : GlobalGoal <-> GlobalGoal 
PD : GlobalGoal <-> GlobalGoal 
ID : GlobalGoal GlobalGoal 

V Bgi, Bg 2 : BG • (Bg 1: Bg 2 ) £TD\J (Bg 1 ,Bg 2 ) ePD\J (Bg ± , Bg 2 ) G ID 



— A role ’’Role” is defined according to the agent’s capabilities. In the 
studied architectures, we note that a same role can appear in various appli- 
cations. Thus, we propose to define the role concept in terms of domain ca- 
pabilities [ DomainCapability ] and of control capabilities [ ControlCapability ]. 
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Formally : DomainCapability == P : Propriety and ControlCapability == 
P : Propriety as : DomainCapability fl ControlCapability = {} and 
DomainCapability U ControlCapability = Capability. 

Role 

domainCapability : P DomainCapability 
ControlCapability : P ControlCapability 

domainCapability U ControlCapability yf {} 



Formally, the relation between capabilities and roles is defined by AgentR 
function. 



AgentR : P 1 Capability — > P Role 

— A generic organizational structure OrgStructure is defined by a finite set 
of relations [ Org Relationship ] between the different roles required for the 
achievement of a common objective. The relations between a set of roles and 
a global goal are defined by the SOrg function. 



Oc : CommonObjective 




R : P-[ Role 




Rorg : P x OrgRelationship 




SOrg : GlobalGoal x Pj^ Role 


— > P 1 OrgRelationship 


#R> 2 




V Bg : Oc.BG • V r : P 1 Role 


r C R • SOrg(Bg, r) C Rorg 



The cooperation model is then described by the CooperationModel schema. 
This later verifies for each role, needed to reach Oc, that it exits at least one 
agent endowed with competence allowing him to take it in charge. 

CooperationModel 

S : Society 

Oc : CommonObjective 

R : P 1 Role 

Sog : OrgStructure 

Coop : Society x CommonObjective x P Role — > OrgStructure 

Coop{S , Oc, R) = Sog 
V r : P 1 Role \ r C R • 3 a : agent 

• 3Ag:Agent\A(a) = Ag • Ag £ S .agents A r C AgentRja. capability) 



How Can This Cooperation Take Place? A cooperative activity can 
take place if particular set of agents converge towards a global goal when 
each one achieves his proper local goals. The local goals of an agent expresse 
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the tasks assigned to it in order to perform a global goal. The assignment is 
based on the roles distribution, described by the AgentGoal function. This 
later verifies the completeness property related to the attribution of the local 
goals of a same global goal. 

AgentGoal : CooperationModel x GlobalGoal x agent h-> P LocalGoal 

V C : CooperationModel • V Bg : GlobalGoal \ Bg £ C.Oc.BG 
• V a : agent • V Ag : Agent \ A(a) = Ag A Ag G C .S .agents 

• AgentGoal(C , Bg, a) C Bg 

The distribution of the roles induces an instantiation of the generic orga- 
nizational structure, called concrete organizational structure 
ConcreteOrg. This structure can establish additional links needed for the 
resolution of the exceptions which can occur during a task progress. An or- 
ganizational link OrganizationLink makesgrapt possible to associate two or 
several agents according to a type of links [LinkType] for a given ipplication 
context LinkContext. 

For abstraction reasons, we consider LinkContext = GlobalGoal. 

ConcreteOrg 

Sog : OrgStructure 

S : Society 

organizationlinks : P x OrganizationLink 

V Or : organizationlinks • V A : Agent • A £ S .agents A £ Or.Ag 

V Or : Organizationlinks • V l : LinkContext 

• l £ Or .linkcontext =>■ l £ Sog.Oc.BG 



How Can We Quantify the Efficiency and the Coherence of This 
Cooperation? In this section, we define two properties related to the quan- 
tification of the efficiency and the performance of a cooperative activity. 

• Efficiency : to qualify the cooperative activity to be efficient, it’s not 
sufficient to perform only local goals constituting the common objective. 
But, we suggest to satisfy the generic constraints to reach this objec- 
tive. The RespectConstraint. function measures the ability of a society 
of agents to achieve the objective according to the so-called constraints. 
Thus, for each agent, we verify according to the RespectS equence func- 
tion, the existence of a sequence of tasks execution that respects the 
retained constraints. Let bool Yes \ No 

| RespectS equence : P Task x Pj^ Constraint — > bool 

The Mission function assigns a mission to every agent that defines its 
part of responsibility in the society. 

i Mission : Agent x CommonObjective — > P Task 
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Thus, a RespectConstraint function is defined formally by : 

RespectConstraint : Society x CommonObjective — > bool 

V S : Society, Oc : CommonObjective 

• V a : Agent \ a € S .agents 

• 3 M : P Task \ M = Mission{a, Oc) 

• RespectS equence(M , Oc.Cr) = Yes => 
RespectConstraint(S , Oc) = Yes 

• Performance of the Organization : it is about the verification for every 
agent the presence of a subset of Organizationnel links that allows him 
to acquire knowledge, through others, in order to achieve its task and 
avoid all deadlock situations. The function ExecuteTask determines the 
execution state of a task according to the organizational structure with 
which the agent is arranged. Let State ::= lock \ unlock. 

ExecuteTask : Agent x ConcreteOrg x Task — >P Knowledge — > State 

V A : Agent\ Soc : ConcreteOrg • A £ Soc.S .agents 

The verification of the performance of a concrete organization according 
to the achievement of a common objective is measured by the function 
PerformOrg. 

PerformOrg : Society x CommonObjective x ConcreteOrg — > bool 

V S : Society, Oc : CommonObjective;Soc : ConcreteOrg 

• V a : S. agents-, t : Task \ t € Mission(a, Oc) 

• 3 K : P Knowledge • ExecuteTask(a, Soc, t) = K A 

Result.Task(a, t,K) = unlock =4> 

PerformOrg(S , Oc,Soc) = Yes 



2.2 Cooperative Agent Model 

In opposition to the cooperation model which treats the collective aspect, the 
cooperating agent model deals with the individual aspects of the agent. In order 
to obtain a unified and precise agent’s definition, we try to capture the essence 
of an ( cooperating ) agent. Ideally, we attempt to produce some ingredients and 
properties for the cooperating agent which should be consistent with our coop- 
eration model. The properties presented below are designed to characterize an 
agent and to adequately perform the cooperative activity. 

We define a cooperating agent as an entity (software or hardware) living in 
an environment (may include other agents), which has capabilities and resources 
enabling it to perform individual tasks and to cooperate with other agents in or- 
der to reach a common objective. This common objective is reached by achieving 
agent’s local goals. Starting from this informal definition we can associate to the 
previews description of the mental state, two representations : the accointances 
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of the agent which represent its partial views on organizations to which it be- 
longs and its agenda which describes the set of the agent’s tasks to be performed 
on time. This agenda can be updated at any time by the agent itself. 

Agenda : Agent — » P (Task x Time) 

V a : Agent • V Oc : CommonObjective • 

V t : Task \ t £ Mission(a, Oc) • 3 T : Time • ( t , T) £ Agenda(a) 

So that the agent could progress evolute in a cooperative society, some 
properties must be considered to adequately perform the cooperative activity : 

— Autonomy : An agent is said to be autonomous if it is able to make decisions 
for performing additional actions, or for changing its current task. Then, we 
suppose that an autonomous agent has the capability of choosing or changing 
his agenda at any time, according to its availabilities and its preferences. 

— Perception : it is the capacity of an agent to perceive its environment and 
consequently to update its mental state. 

— Auto- Organization : an agent is able to auto-organize himself, when it has 
the capability to evaluate his interactions with others and add organizational 
links or remove some of them. 

Let [Env State] consider a set of environment states. A cooperative agent is 
defined by a CooperativeAgent schema that verify the previously detailed prop- 
erties. 



CooperativeAgent 

agent 

agenda : P( Task x Time) 
views : P OrganizationLink 

UpdateAgenda : P( Task x Time) x EnvState — > P( Task x Time) 
Perception : EnvState x ( Knowledge U Belief) — > ( Knowledge U Belief) 
Auto Organization : EnvState x P OrganizationLink — > P OrganizationLink 

V Ag : Agent\ a : agent \ Ag = A(a) • agenda = Agenda(Ag) 

V a : agent • V Oc : CommonObjective 

• views = {view : OrganizationLink \ A(a) € view.Ag} 



The described properties are necessary to lead convenably a cooperative ac- 
tivity. Also, we can extend this list to other properties such as the reactivity, the 
capacity of reasoning, the persistence, the intelligence, etc. 

3 Illustration 

We propose, in this section, the validation of the formal models, already defined, 
by their instantiation on the MASPAR system ( Midti-Agent System for Parsing 
ARabic) for parsing arabic texts. MASPAR provides for each sentence of the 
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analysed text the possible syntactic structures [11]. The agents implied in this 
system must satisfy, as presented in the CommonObjective(MASPAR) schema, 
four global goals : Bg_ 1 (text segmentation), BgJ2 (morphological analysis of 
the words), Bg_ 3 (syntactic analysis of the sentences) and Bg_ 4 (resolution of 
the complex linguistic forms such as the anaphoric forms and the ellipses). The 
resolution of this goals require five main roles. We denote this roles by: 

RSeg : split the text into sen- 
tences and segment each one in 
words, 

R— Morph : correspond to the mor- 
phological analysis, 

RSyn : correspond to the syntac- 
tic analysis of the sentences, 

R-Anph : resolve the anaphoric 
forms in the sentence, 

R-Elip : detect and rebuild the 
various types of ellipses. 

As defined in our cooperation model, the roles must be organized according 
to an organizational structure in order to collaborate in the achievement of a 
global goal. 

In the MASPAR system, we identify three types of organizational links ac- 
cording to [13]: the accointance links(Rol), the communicationel links (Ro 2), 
and the operative links (Ro3). The set of the relations established between the 
different listed roles, constitutes the generic organizational structure of MAS- 
PAR so-called OrgStructure(MASPAR). We present, as an example, the organi- 
zational links between the R_Morph role and the RSeg role in order to achieve 
the BgA global goal by: SOrg(Bg_ 1, R-Morph, RSeg) = {f?o3, Ro2}. 

The MASPAR architecture is based on a functional approach which associates 
to each agent one of the already cited roles. Thus, the whole of the agents is com- 
posed of: Segmentation Agent {ASeg), Morphologic Agent ( A_Morph ), Syntac- 
tical Agent ( ASyn ), Anaphoric Agent ( A_Anph ) and Elliptic Agent ( ASlip ). 
The A-Morph is composed of two agents: an agent for the morphological pre- 
treatments and another agent for the morphological tagging. This case illustrates 
the recursive aspect in the Agent definition. 

The distribution of the roles among agents defines the concrete organiza- 
tional structure which describes the organizational links between these agents. 
We present, as an example, the relations between ASeg agent and ASMorph 
agent by the OrganisationLink(S eg S Morph) schema. 

OrganisationLink(S eg _E Morph) 

Ag == {ASeg, A_E Morph) 
linktype == {Ro3} 
linkcontext == {BgA} 



CommonObjective(MASPAR) 

BG == {Bg_ 1, Bg_2 , Bg_ 3, Bg_ 4} 
ID (BgS,Bg_l) 

PD {Bg_2,BgS) 

PD ( BgS,BgA ) 

PD (BgA,BgJ3) 

PD (Bg_3,BgS) 

PD (BgJ2,Bg_3) 
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4 Conclusion 

The generic modelling of the interaction mechanisms is still an open research do- 
main. Indeed, this works proposes a generic conceptual cooperation model using 
the Z notation. This model defines the concepts describing a cooperative activity 
while integrating them with the concepts related to the organizational modelling. 
An instantiation of these concepts with MASPAR system [11] constitutes a first 
validation of the proposed models. Also, we proposed the specification of the 
necessary properties to maintain a consistency at the individual level (agent) as 
well as at the collective level (Society) . This specification has been supported by 
the type checking and the theorem proving of the Z-Eves tools [1] . 

These results deserve some improvements. Indeed, it is recommended to 
associate the concepts related to coordination and negotiation with those of the 
cooperation in order to improve the action of the group either by an increase 
of the performances, or by a reduction of the conflicts. The association of con- 
cepts related to the cooperation, negotiation and coordination, constitutes our 
perspective for future works. This association permits to define a conceptual 
interaction model. In the second perspective we propose the definition of a co- 
operation and negotiation language like the coordination language proposed in 
[14]. This language will be defined by a precise syntax and an operational se- 
mantics in term of transition systems. These languages of interaction come to 
be integrated in a platform of design and implementation of MAS. 
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Abstract. In this paper we describe a component architecture for web- 
enabling MultiAgent Systems intended for the deployment of distributed 
artificial intelligence applications. This integration allows agents to pub- 
lish web services or standard HTML providing thus a convenient interface 
for other distributed components. By using an Embedded Web Services 
approach we provide a simple and efficient mechanism for enabling agents 
to interoperate with its users and enterprise systems such as portals or 
client-server applications. The architecture we propose is presented from 
a structural as well as functional point of view. Comparisons are drawn 
with other proposals for integrating intelligent agents with web services, 
and experimental performance measurements show the advantages of our 
approach. 



1 Introduction 

In the past 15 years, agent technologies have evolved from its own set of stan- 
dards, languages [1] and applications and it was not until the last years when the 
issues concerning interoperability between agents and more traditional applica- 
tions began to be addressed. At the time of this writing we may say that agent 
development have found some stability at the implementation language level, 
having tools and frameworks in mature object-oriented development languages 
such as the JADE platform[2]. 

However, having software agents to interoperate with users and other ap- 
plications using web-based interfaces remains as a critical issue for widespread 
adoption of agent technologies. In this paper we describe a component archi- 
tecture for web-enabling MultiAgent Systems intended for the deployment of 
real-world distributed artificial intelligence applications. This integration allows 
agents to publish XML [3] Web Services [4] or standard HTML providing thus a 
convenient interface for other distributed components. 

Throughout this discussion we will use the Web Service definition stating 
that a Web Service is “a software system designed to support interoperable 
maclrine-to-machine interaction over a network. It has an interface described in 
a machine-processable format (specifically WSDL [5]). Other systems interact 
with the Web Service in a manner prescribed by its description using SOAP- 
messages [6], typically conveyed using HTTP with an XML [3] serialization in 
conjunction with other Web-related standards.” [4], 
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We consider as well the following definitions: 

— A component, according to Fielding[7], is “an abstract unit of software in- 
structions and internal state that provides a transformation of data via its 
interface”. Data transformed by a web component is intended to be used in 
HTTP conversations that may be HTML for web pages or XML [3] for Web 
Services. We consider the use of Servlets as they are the Java standard user- 
defined web components. JSR.-154 defines them as “A Java technology-based 
web component managed by a container that generates dynamic content” [8]. 

— A container is “an entity that provides life cycle management, security, de- 
ployment, and runtime services to components” [9] . There exist well defined 
containers for software agents as well as for web components. Agent plat- 
forms like JADE[2] platform provide containers that function as execution 
environments for agents. As specified in JSR-154[8], web containers are usu- 
ally built within web servers and provide network services related with HTTP 
request processing. 

2 Architectural Approaches 

As mentioned above, Web-enabling MultiAgent Systems requires interoperability 
both at component and at container level, however existing solutions address 
each concern separately. The main identified approaches are discussed as follows. 

2.1 Component Interactions, a Gateway Approach 

Authors of the JADE platform have defined a basic approach for communicating 
agents and web components, that conforms to the transducer approach identified 
by Nwana[10]. 

The solution assumes the existence of two agent containers, one standalone, 
that we may call the main container and one contained itself within the web con- 
tainer. Each container is executed in a separate JVM 1 system process. Whitestein 
Technologies A.G. developed the WSAI[11], an open-source product which re- 
fines and implements the initial architecture for use in the AgentCities[12] project. 

WSAI introduces the concept of “Gateway Agent” as an agent living in the 
web container, responsible of translating HTTP requests into ACL messages. 
The general gateway architecture components are shown in figure 1. 

One of the major drawbacks of the approach resides in the existence of several 
processes that should synchronize using remote method invocations even if both 
of them are deployed on the same machine. The synchronization problem is ad- 
dressed by instantiating the “Gateway Agents” and “Caller” objects in a request 
basis. “Callers” implements the Java synchronization and time out handling as 
shown in figure 2. 

An AgentCities[12] requirement that imposes several complexities to the 
WSAI, is that it was designed to work with any running FIPA-compliant agent 
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JVM JVM 



Key: 



Local interaction 
Remote interaction 



^ ^ Component 

a Container 



Executes 



a 



Proccess/thread 



► Synchronizes with 



Fig. 1 . Gateway Architecture. Component View 




Fig. 2. Gateway Approach. Interactions Sequence Diagram 

(even non Java-based ones) without access to its source code. As a consequence, 
the implementation efforts move into the web-tier, whose complexity required 
the project to rely on code generation tools. 
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2.2 Container Interactions, a Managed Applications Approach 

Cowan and Griss [13], proposed BlueJADE, a connector that allows the man- 
agement of a JADE platform by a Java application sever, and therefore its in- 
teraction with the enterprise applications deployed inside it. Blue JADE’s main 
strengths occur at the container level. Although the containers are running sep- 
arately, the connector eliminates the need for remote calls which enables the use 
of the object-agent channels. Figure 3 shows the coupling enhancement in the 
resulting architecture. 



3 An Embedded Solution 

3.1 Design Goals 

In this work we are proposing an architecture for Web Service and agent inte- 
gration that synthesizes and simplifies the preceding approaches in a way that 
achieves the goals of: 

— Reducing the time to market of agent-based applications. 

— Delivering good performance for large volume of users. 

— Improving the code base maintainability. 

This solution assumes that the agent and web component source code is 
available. Regarding the presented architectures, this solution mostly trades-off 
flexibility and manageability in favor of simplicity. 

3.2 Component Model 

In order to outline the present solution we started conceiving the agent-based 
application as a “black-box” which exposes several high-level web interfaces to 
their users. With respect to the gateway approach this idea arrives at a rather 
opposite approach that embeds the web container within the context of the Agent 
platform. As a result, by increasing internal cohesion and reducing coupling with 
respect of other applications, an agent system could be packed with their own 
specialized set of Web Services and operated via an embedded web application. 
The component view of the architecture its shown in figure 4. 

Notice that some of the application server infrastructure present in BlueJADE 
architecture is replaced by lightweight custom programs. In this particular case, 
the Launcher is a main program, that initializes and starts the JADE platform 
besides an embedded instance of the Tomcat Web Server [14]. The Registry is 
nothing but a data structure that holds references to the running Service Agents 
implemented as a Singleton [15]. 

3.3 Interaction Model 

At this point is easy to deduce how an interaction model could be without hav- 
ing to perform any remote method invocation. The model shown in figure 5 
considers interactions similar to the ones in the gateway architecture until the 
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Fig. 3. Managed Applications Approach. Component View 
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Fig. 4. Embedded Services Architecture. Component View 



HTTP request is delivered to the Gateway Agent . In this embedded architec- 
ture the complex issues of interactions are handled between two core framework 
components: 

— A call monitor object that serves as a shared memory space between the 
agent and the web component, and handles synchronization among them. 
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doService() 



if CallMonitor.hasResultQ 



Q 

Fig. 5. Embedded Services Architecture. Interactions Sequence Diagram 

— A service behaviour object used by agents for exposing internal tasks as Web 
Services. 

4 Evaluation 

By the means of simple metrics we could show some of the desirable properties 
of the proposed architecture. As seen in table 1, a functional implementation for 
embedding Web Services into agent based applications is several times smaller 
than the implementation of a gateway framework from the WSAI project. 

Altouglr the number of classes is not an absolute metric of software com- 
plexity, from the interaction models we can deduce that the interactions present 
in the embedded model (figure 5) are nothing but a subset of the interactions 
required in a gateway approach (figure 2) , as a consecuence the internal com- 
plexity of the embedded Web Services architecture (EWSA) is not significa- 
tively different from the gateway architecture. In the embedded architecture only 
one abstract class is provided as extension point for service developers, which 
consequently simplifies the agent and Web Service implementation as shown in 
table 2. 
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Table 1. Integration framework size comparison 



Package 


Total classes 


Abstract 


Concrete 


Embedded 


6 


1 


5 


Gateway 


52 


6 


46 



This comparison is provided as a reference of the resulting programming 
models, considering that even if a code generation tool could actually simplify 
the development process, the additional interactions remains with a significative 
performance penalization. 

A benchmark between the WSAI [11] sample currency exchange service and 
the same service implemented in the embedded architecture shows a clear im- 
provement both in performance and in scalability. The performance gain in the 
embedded architecture can be interpreted as a result of the elimination of the 
nested network calls and their respective serialization and transmission overhead. 
Test results are shown in figure 6. 

4.1 Requirements 

In enterprise environments the problem of Web-enabling MultiAgent Systems 
may be stated in terms of the required interactions of software agents, web 
components, and their respective containers. 

Although the fact that we consider specific products to implement the so- 
lution, in its more general form the proposed integration architecture could be 
implemented under several basic assumptions about the technologies used to 
produce Web-enabled MultiAgent Systems, noticeably: 
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Table 2. Example implementation size comparison 



Package 


Total classes 


Abstract 


Concrete 


Embedded 


2 


0 


2 


Gateway 


4 


1 


3 



— Components and containers are implemented in a general purpose, inter- 
preted 2 , object-oriented language with thread support such as Java, Python 
or C#. 

— For the language selected there exists an agent container that conforms to 
the FIPA Abstract Architecture [16] and Agent Management specifications. 
For the Java language we tested the container provided by the JADE [2] 
agent platform. 

— Service agents, defined as those that process requests formulated by a hu- 
man user or application, are implemented in a thread-per-agent model, in 
which agents periodically execute a set of tasks or behaviours as described 
by Rimassa[17]. 

— Agents have two communication queues or “virtual channels” that enables 
them to process FIPA ACL[18] messages or plain (Java) objects. 

— There is a web container, that provides the execution environment for the 
web components. We integrated the Tomcat Web Server[14] that conforms 
to the Java Servlet API formally described in the JSR-154[8] specification. 

— There is a well specified user-defined web component programming model. 
We consider the use of “Servlets” as they are a standarized and mature 
technology for the Java platform. 

The embedded implementation we provide is useful in any Java based agent 
development environment, whenever the agent and web component source code 
is available. Adaptation for other languages and environments could be possible, 
though we don’t delve into this question here. 

5 Conclusions 

In this paper we have outlined a simple and efficient architecture for embedding 
Web Services into agent-based applications. We presented experimental evidence 
to support our claims about simplicity and efficiency. 

The embedded architecture synthesizes key features of other architectural 
proposals into an effective solution for developing web-enabled agent-based ap- 
plications. It has been successfully used in “real-world” scenarios like the JITIK 3 
Project, that is already supporting several collaborative environments. 

We are starting a work on methodologies for using the proposed architecture 
in such a way that hybrid agent-web systems could be produced in a systematic 
way. 

2 That relies on platform-specific interpreters or virtual machines 

3 Just-in-time information and knowledge [19], http://lizt.mty.itesm.mx/jitik/ 
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Abstract. We present our bayesian-modeler agent which uses a proba- 
bilistic approach for agent modeling. It learns models about the others 
using a bayesian mechanism and then it plays in a rational way using 
a decision-theoretic approach. We also describe our empirical study on 
evaluating the competitive advantage of our modeler agent. We explore a 
range of strategies from the least- to most-informed one in order to eval- 
uate the lower- and upper-limits of a modeler agent’s performance. For 
comparison purposes, we also developed and experimented with other 
different modeler agents using reinforcement learning techniques. Our 
experimental results showed how an agent that learns models about the 
others, using our probabilistic approach, reach almost the optimal per- 
formance of the oracle agent. Our experiments have also shown that a 
modeler agent using a reinforcement learning technique have a perfor- 
mance not as good as the bayesian modeler’ performance. However, it 
could be competitive under different assumptions and restrictions. 



1 Introduction 

A key focus on modeling other agents is concerned with the prediction of the 
behavior of other agents (exploiting the internal models about the others’ pref- 
erences, strategies, intentions, and so forth). Then, the modeler agent can use 
this prediction in order to behave in the best way according to his preferences. 

Research on modeling other agents has been approached from different per- 
spectives. Tambe et al [1] have proposed an approach for tracking recursive agent 
models based on a plan recognition task. Gmytrasiewicz [2] has presented the 
Recursive Modeling Method (RMM) which uses nested models of other agents, 
combining game-theoretic and decision-theoretic mechanisms. Suryadi and Gmy- 
trasiewicz [3] have proposed the use of influence diagrams for learning models 
about other agents. Vidal and Durfee [4] have developed an algorithm in order to 
see which of the nested models are important to choose in an effective manner. 
These authors have also presented a framework for determining the complexities 
of learning nested models [5] . 
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There have been other papers sharing our view of learning agent models in 
an iterative way using a Bayesian approach, for instance: Gmytrasiewicz et al 
[6] have proposed a framework for Bayesian updating of agent models within 
the formalism of the RMM; Zeng and Sycara [7] have presented an experimental 
research where a buyer models the supplier under a Bayesian representation in 
Bazaar, a sequential decision making model of negotiation, 

In [8], we reported some results on evaluating, in an experimental way, the 
advantage an agent can obtain by building models about the others’ roles and 
strategies. We explored a range of strategies from least- to most-informed in 
order to evaluate the upper- and lower- limits of our modeler agent performance. 
Now, in this paper we present an empirical evaluation of our bayesian-modeler 
and an agent learning models about other agents using different reinforcement 
learning strategies. 

In the following sections, we first give a quick review our experimental frame- 
work. We also review the basic non-modeling strategies for comparison purposes. 
Then, we present our bayesian-modeler and reinforcement-modeler agents. Then, 
we present our experimental scenarios and discuss the results we obtained. Fi- 
nally, we present the conclusions of this paper. 



2 Experimental Framework 

We have implemented the Meeting Scheduling Game (MSG) as our experimental 
testbed which models some characteristics of the distributed meeting scheduling 
problem. Our main concerns creating this test bed were: to allow self-interested 
as well as cooperative behavior, show or hide players’ private information, and 
define different players’ roles and strategies. 

In this game, a group of agents try to arrange a meeting in such a way that 
certain meeting slot is available for as many as possible players. So that each 
player tries to arrange a meeting at a convenient and free time slot with an 
acceptable utility for him. 

Each player’s role is defined by a preference profile which is coded as a cal- 
endar slot utility function, ranking each slot from the most preferable slot to 
the least preferable one. We have defined several agent roles. For example: The 
Early-Rising prefers the early hours of the day; The Night- Owl prefers the meet- 
ings to be scheduled as late as possible; The Medium prefers the meetings to be 
around noon; The Extreme prefers to have meetings early in the morning or late 
in the afternoon. 

Player’s strategies are rules that tell them what actions to choose at each 
decision point. Strategies can take into account only the own player’s preference 
profile or they can even use models about the others. In the subsequent sections 
we will define several different strategies. 

Since a combination of a role and a strategy defines a player’s preferences 
and behavior, the conjunction role/strategy of a player is seen as his personality 
in the MSG. 
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Each player proposes a slot taken from his own calendar composed of a work- 
ing day with eight hours. Each player’s calendar is set at a specific calendar 
density which is the proportion of busy hours in the calendar. The goal of a 
player in the MSG is to accumulate more points than his competitors in the 
game. A game consists of a predefined number of rounds and each player tries 
to accumulate points after each round. 

There is a referee who ensures that all the players obey the rules of the game. 
He is also responsible for accumulating points for each agent after each round in 
an individual point counter for each player through the whole game. 

After each round, each player’s calendar is randomly reset, scrambling free 
and busy slots, maintaining the same predefined calendar density. Then, another 
round is started and the process is repeated until the predefined number of 
rounds is accomplished. Note that this implies we are not really “schedulling” 
any meetings, as the winning slots does not stand from a round to the next. 

In each round, every player simultaneously proposes a slot according basically 
to his own individual strategy and role. However, the players’ proposals are not 
completely determined by their own personalities because some slots can be 
busy in their calendars. In the first round, each player randomly proposes an 
available slot. These initial random proposals are needed as a ’’bootstrap” for 
the collaborative strategy defined in the following section. The other strategies 
are not affected by this initial round, since this is the only round where nobody 
accumulate points. 

After all the players make their proposals, several teams are formed. Each 
team is composed of all those players who proposed the same calendar slot. 
Then, each team joint utility is calculated, summing up all the team members’ 
calendar utilities: TJU(t) = Xwmet U m {st)- Here, t is a team, to is a member 
of the team, s t is the slot proposed by members in t, U m is the slot utility of 
member to. Finally, the round is won by the team which accumulates the greatest 
team joint utility. 

Once the winning team is selected, each agent earns points according to 
the following predefined scoring procedure: all the players outside the winning 
team accumulate zero points for that round and each agent a in the winning 
team t accumulates his own slot utility plus the team joint utility: G a (s) = 
TJU{t) + Ua(st). 

Although this game is based on the general distributed meeting scheduling 
problem, it resembles only some of its characteristics. 



3 Basic Strategies 

We set up a framework for characterizing all the possible strategies in the MSG, 
ranging from a least-informed to the most-informed one. This allows us to place 
every given strategy in a framework where it can be better compared to others, 
and in particular to place modelling strategies in context. The lower and upper 
limits of our framework are given by the following strategies: 
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Indifferent Strategy: An agent using this strategy chooses his next proposal 
among his action set using an uniform (equiprobable) distribution. 

Oracle Strategy: An agent using this strategy can see in advance the others’ 
next move because he knows the other agents’ calendars, roles and strategies. 
For each free slot s in his calendar, he calculates his possible gain G 0 (s ), if he 
proposed that slot. Then, he finds the agent m who would earn the maximum 
gain G m (s ) among the rest of the players, if he proposed that slot. Then, he 
calculates the utility of each slot s as his gain with respect to the profit of 
agent m: U(s) = G 0 (s ) — G m (s). After checking all his free slots, he proposes 
the slot with the highest utility: argmax s U(s). 

An indifferent agent does not take into account any information about the 
other agents. He does not even take into consideration his own preferences. How- 
ever, he must propose a free slot in his calendar, as must do all the other strate- 
gies as well. This strategy is considered as the lower limit for every “reasonable” 
strategy, since a strategy performing worse than the random is hardly worth 
considering. 

An oracle agent knows the roles and strategies of the other agents (i.e. he has 
the correct models about the others). Furthermore, he even knows the others’ 
calendars. So that an oracle agent is able to see in advance the others’ moves 
and then he just chooses to propose the slot that maximizes his utility in each 
round of the game. Although an oracle agent has the best chances of winning 
each round, he can not always win! This is because of his random calendar 
availability, according to the fixed calendar density. 

In order to have additional points of reference, we have also defined the 
following two heuristic strategies: 

Self-Centered Strategy: This strategy tells the agent always to choose the 
free slot which just maximizes his own calendar slot utility. 

Collaborative Strategy: Using this strategy, the agent chooses the free slot 
that was proposed by the biggest team (greatest number of members) at the 
previous round. In case of ties, the agent ranks them according to his own 
calendar slot utility. 

A self-centered agent does not consider information about the other agents 
but he takes into account his role. A collaborative agent also takes into account 
the agent’s own role. However, it also takes into consideration information about 
the previous round, trying to join in the biggest observed team. 



4 Bayesian Strategy 

Let us first introduce our term model about another agent. We just see it as 
a vector which records a probability distribution of the actual character of the 
modeled agent. In the context of the MSG, each agent has two basic models about 
each other agent a. The first one is the role model : r a d = (r\, , r n ). Here, each 
Vi is the probability that agent a has the particular role i and n is the amount of 
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different predefined roles. The notation r a (i) refers to the probability r, of role 
i. The second model used in the MSG is the strategy model: s a d = (si, . . . , s m ). 
Here, each s t is the probability that agent a has strategy i and m is the amount 
of different predefined strategies. The notation s a (z) refers to the probability Sj 
of strategy i. 

Since we are assuming independence between roles and strategies in the MSG 
(section 2), it is easy to construct a new combined model for each other agent: 
the personality model. This model is just a two-dimensional matrix, rs ai where 
each element rs a (i,j) is just calculated as follows: rs a (i,j) = r a (i)s a (j). Now, let 
us define an decision-theoretic strategy that take explicit advantage of knowing 
the others’ models: 

Semi-modeler Strategy: This strategy tells the agent to choose the slot which 
maximizes his expected utility based on predefined fixed models about the 
other agents. 

It is assumed that a semi-modeler agent already have models about the others 
and his strategy just uses these probabilistic models to choose the action that 
maximizes his expected utility. The models are given to the semi-modeler agent 
at the beginning of the game and they never change during all the game. It is 
also important to note that the given models are not necessarily correct models 
about the others. In [9] we already presented a more detailed description of the 
semi-modeler agent. 

In order to build a modeler agent, model construction is required. Let us 
define a modeler strategy that uses an Bayesian updating mechanism in order 
to build the others’ models in an incremental and iterative way: 

Bayesian-Modeler Strategy: An agent using this strategy incrementally 
builds models about the others using a Bayesian belief updating approach 
and chooses the action which maximizes his expected utility: 

A bayesian-modeler agent does not have any information about the others. 
However, as stated in section 2, the set of predefined roles and strategies are 
public knowledge. At the beginning, the modeler agent can behave as a semi- 
modeler agent with equiprobable models about the others. That is, with no other 
knowledge about the others, it is reasonable to start with equiprobable proba- 
bility distributions of the possible traits about the others. Then, the modeler 
agent can start to update those models based on the others’ behavior. 

This agent builds models about the other agents in an incremental and it- 
erative way, updating those models after each round during the whole game. 
All the probabilities of each model are incrementally updated, trying to reach 
the actual character of the agent being modeled. The detailed bayesian-modeler 
strategy is as follows: 

1. At the first round, start with equiprobable models about the others, run the 
semi-modeler strategy, and propose the resulting slot. 
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2. At the next round, for each other agent a: 

(a) Observe what was the a’s proposal, s a , in the previous round and update 
a’s personality model, rs a , using a Bayesian updating mechanism to 
obtain the corresponding posterior probabilities of the a’s personality, 
per a (i,j), given that a proposed slot s a , pro a (s a ), in the previous round: 
rs a (i,j ) = P{per a {i,j)\pro a {Sa))- 

(b) Decompose the updated a’s personality model in order to build two new 
separated role and strategy models. That is, update each element in r a 
and s a : r a (i ) = rs a (i,j) and s a (j) = rs a (t, j). 

3. Using the new updated models about the others, run the semi-modeler strat- 
egy and propose the slot s m with the maximum expected utility. 

4. If it was the last round, the game is over. Otherwise go to step 2. 



The model-updating mechanism is based on the well known Bayes’ rule. In 
this case, we multi-valued random variables: the personality models. In fact, a 
personality model rs represents a probability distribution of personalities. So 
that the probability that an agent a has the personality resulting from combin- 
ing role i and strategy j, P(per a (i, j)), is precisely the value rs a (i,j) in matrix 
rs a and the equation used to update each personality model can be rewritten as 
follows: ™„ (i , Ti » p™ 

ties P(per a (i,j )) are taken from the last recorded value rs a (i,j ) in matrix rs a . 
The conditional probabilities P(pro a (s a )\per a (i,j)) can be calculated from the 
known calendar density and the known agent behavior due to the personality 
per a (i,j)- Thus, the bayesian-modeler is able to get all the posterior probabilities 
from the calculated conditional probabilities and the known prior probabilities. 
Then, this rs matrix is updated with these new probabilities in order to be used 
as prior probabilities in the following round. 



5 Reinforcement Strategy 

As in the case of our bayesian-modeler agent, our reinforcement-based strategy 
keeps and incrementally updates models about the others: 

Reinforcement-Modeler Strategy: This strategy learns models about the 
others’ personalities (i.e. roles and strategies). This strategy is very similar, 
in essence, to our bayesian-modeler strategy: it learns the others’ models and 
exploits them with a greedy decision-theoretic approach. 

The idea is to keep vectors of values instead of probabilistic vectors. Keeping 
this difference in mind, we have here three models for each other agent a as in 
the bayesian-modeler strategy case. The reinforcement-modeler agent constructs 
the state signal. That is, each state is not given just by the immediate sensations 
but it is a highly processed version of the sensations. Let us see then how we 
visualize the reinforcement learning in this case: 
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Discrete Time Steps: These correspond to the rounds of each game. That is, 
we have 10 time steps t = 1,2, 10. 

States: At each time step t (at the beginning of each round), the agent receives 
a representation of the environment St £ S where S is the set of possible 
states. In this case each state is the set of calendars with random free and 
busy slots of the other n agents plus the personality models of those n agents 
which are constructed by the reinforcement-modeler agent. 

Actions: Based on the state, the learner will choose an action at £ A(st) from 
a set of possible actions A(s*) available in state s t . In this case each possible 
action a t is just the proposal of each free slot in the calendar of the agent. 
Rewards: As a consequence of its action, in this case the reinforcement-modeler 
agent will have n-m rewards, r^yt+i, for each other agent i (of the n agents) 
with m different possible personalities. These rewards are not directly given 
by the referee agent but they are computed by the reinforcement-modeler 
using the information of each state. 

Intuitively, if an agent chooses the first slot in the morning, we have a general 
human tendency of giving more weight (or reward) to an early-rising role than 
a night-owl one. Inspired by this fact, the rewards used by this strategy are not 
just the earned points by the agents at each round. Here, the rewards are a 
function of the observed proposed slot in the previous round. Thus, the strategy 
first observe the others’ choices and then calculate the rewards for each possible 
personality of the other agents. 

After updating all the personalities values, the reinforcement-modeler agent 
decomposes these models in two new separated role and strategy models for 
each other agent. This decomposition is the reverse process of the personality 
model composition explained above. In a similar way to the bayesian-modeler 
agent, here the reinforcement-modeler agent compute each element in r a and s a 
as follows: r a [i) = rs a (i, j) and s a (j) = After this, the agent 

just re-uses the semi-modeler strategy in order to exploit the updated models, 
choosing the slot with the maximum expected utility. 



6 Experimental Results 

Here, what we call an experiment is a series of games with the same charac- 
teristics and groups of different and related experiments are called experimental 
scenarios. At the beginning of each game, all the agents are initialized with ran- 
dom roles taken from a set of two opposite roles (the early-rising and night-owl 
roles presented in section 2) and eight-slots calendars with the calendar density 
fixed at 50%. All the games are composed of ten rounds (the fourth and fifth 
experimental scenarios are the exceptions) . Also in all these experiments, we run 
three agents (the exception is the second experimental scenario). Furthermore, 
when we run a bayesian-modeler or a reinforcement-modeler agent, he is always 
learning the models about the others and playing the game at the same time. 
We have set up series of games in order to measure how agent performance is 
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affected by different strategies. Once a game is completed, we call it a “success”, 
if the strategy under consideration wins. Otherwise it is considered a “failure”. 
Our experiments are composed of 500 independent games and we have calculated 
that the results obtained in these experiments has a 95% confidence of getting 
an error not greater than about 0.05. In all tables presented here, we show the 
performance of each strategy as the percentage of success. 

The goal of the first scenario (see table 1) is to compare the performance of 
the non-modeling strategies discussed in section 3. Thus, we run here an indiffer- 
ent agent first against self-centered agents, then against collaborative ones, and 
finally against both. As expected, the performance of the indifferent strategy is 
always the worst, giving us a lower- limit performance to compare other reason- 
able strategies. We intuitively thought that the performance of the collaborative 
agents should be better because they can team each other. However, as we can 
see, in the first two experiments, the self-centered strategy appears to be better 
than the collaborative one against the indifferent agent. In the last, experiment, 
we can see that the self-centered strategy clearly outperforms the collaborative 
one, while the indifferent’s performance is very low. As it is shown elsewhere 
[10], when incrementing the number of agents, the collaborative’s performance 
increases, outperforming the self-centered. 



Table 1 . Comparing the basic strategies 



| Experimental Scenario 1 j 


Experiments 


Strategies | 


Indifferent 


Self- Centered 


Collaborative 


Experiment 1.1 


7.59% 


92.41% 


— 


Experiment 1.2 


18.15% 


— 


81.85% 


Experiment 1.3 


3.86% 


80.59% 


15.45% 



The goal of the second experimental scenario (see table 2) is to compare 
the performance of the bayesian-modeler agent discussed in section 4 and the 
semi-modeler strategy presented in 4 using fixed opposite, equiprobable and 
correct models about the others. Here we run four experiments with a self- 
centered agent, a collaborative one, and we vary the strategy of the third agent 
in each experiment. In the first experiment we run an oracle agent who has 
the correct models about the others. In the second one, we run a semi-modeler 
agent who uses fixed equiprobable models. In the third experiment, we again 
run a semi-modeler agent but now with fixed opposite models about the others. 
In the last one, we finally run a bayesian-modeler who is learning the models 
and playing at the same time during the ten rounds of each game. In the first 
experiment of this scenario, we get the empirical upper-limit performance given 
by the oracle strategy. On the other hand, running a semi-modeler with the 
incorrect fixed opposite models, we expect to have a lower-limit performance. 
We can see, in the third experiment, that this limit is indeed so low, being even 
the self-centered strategy the winner. The second experiment, shows a new more 
refined lower-limit performance, given by a semi-modeler with fixed equiprobable 
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models. So that, our expectations for a good modeler performance is to get a 
performance somewhere between the upper-limit given by the oracle and the 
lower-limit given by the semi-modeler with fixed equiprobable models. As we 
can see, our expectations were confirmed in the last experiment. 



Table 2. Comparing the bayesian and semi-modeler strategies 



Experimental Scenario 2 


Exp 


Strategies 


Modeling 


Self- Centered 


Collaborative 


Models 


Perform. 


2.1 


20.58% 


10.88% 


Correct 


68.54% 


2.2 


33.13% 


11.31% 


Equiprobable 


55.56% 


2.3 


51.96% 


20.19% 


Opposite 


27.85% 


2.4 


30.31% 


7.67% 


Bayesian 


62.02% 



The goal of the third scenario (see figure 1) is to evaluate the performance 
of both the bayesian-modeler and the reinforcement-modeler agent, varying the 
number of rounds needed to learn the models about the others in each exper- 
iment. Looking at the results in this scenario, it is clear how the performance 
improves as the number of rounds increases. Here, we can directly compare 
the different performances we have obtained with the indifferent, oracle, semi- 
modeler, bayesian-modeler and reinforcement-modeler strategies when playing 
against the self-centered and collaborative ones. As we can observe in games 
with only one round, the bayesian-modeler strategy performance starts with a 
performance between the limits of the semi-modeler strategies using fixed op- 
posite and equiprobable models. This performance increases when we increase 
the number of rounds in the games, trying to reach upper-limit given by the 
oracle strategy. On the other hand, we can observe that the performance of 
the reinforcement-modeler is actually slightly worse than, but very close to, the 
bayesian-modeler one. Both performances increases as the number of rounds in- 
creases, showing a performance very close to the oracle one just few rounds after 
the first ten rounds. In fact, in other experiments, not shown here, we run games 
as long as 100 rounds which show that the reinforcement-modeler performance 
is slightly increasing trying to reach the asymptotic upper-limit performance but 
always lower than the performance of the bayesian-modeler. 



7 Conclusions 

In this paper, we presented our bayesian-modeler agent which models other 
agents in a probabilistic way, combining Bayes and decision theory, and we also 
presented an empirical evaluation of its performance. We have used a collection of 
reference points: The indifferent and oracle strategies provide the extremes of the 
spectrum, ranging from least- to most-informed strategies. We have also obtained 
other more refined reference points given by the semi-modeler strategy with 
fixed opposite and equiprobable models. Our experimental results have shown 
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Experimental Scenario 3 



Performance 




Fig. 1. Comparing the bayesian and reinforcement strategies 



how both, the bayesian-modeler and reinforcement-modeler strategies perfor- 
mances, are indeed better than the empirical lower-limit we have obtained and, 
in fact, we have also observed how these performances increase as the number 
of rounds increases. Our experiments have also shown that after thirteen rounds 
the bayesian-modeler performance is really close to the oracle one. On the other 
hand, we could also observe how the reinforcement-modeler agent performance is 
increasing but it ends about 10% below the performance of the bayesian-modeler 
after about 13 rounds. However, this performance is considerable close to the 
optimal performance marked by the oracle agent. 
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Abstract. Web services coordination has become a central topic for 
further development of Internet-based distributed computing. One ap- 
proach to this coordination task is supported by generative communica- 
tion, and more specifically by some implementations of the Linda model 
as JavaSpaces. However, when applying these coordination strategies to 
real projects, some drawbacks appear. One of the main limitations is the 
lack of transactional queries. In this paper we deal with this problem 
extending the matching mechanism of the Linda model. Then, a variant 
of the well-known RETE algorithm can be devised in order to imple- 
ment our extended Linda model efficiently. This also opens new research 
lines in which Artificial Intelligence techniques (as advanced blackboard 
architectures) could be applied to the field of web services coordination. 



1 Introduction 

From a conceptual point of view, most of the mechanisms used to coordinate web 
services may be interpreted as message brokers: the information items dispatched 
for coordination can be considered as messages, and then, the integration task 
itself can be seen as a logic to deliver messages, that defines implicitly a mes- 
sage broker. Routing logics can be classified according to the sender’s identity, 
message type, or message content. They can be typically defined in a rule- like 
language. In this way, message brokers provide a loosely coupled architecture for 
application integration [1]. 

The shared dataspace of the Linda model [2] is particularly well-suited for 
this interpretation, and several extensions of it have been proposed in order to 
introduce the concept of coordination into the development of middleware for 
web-based environments [3] . The use of a shared dataspace for entity communi- 
cation was first introduced in Artificial Intelligence through the notion of black- 
board [4]. Blackboards are information spaces where messages can be published 
and retrieved. 
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The main aim of this paper is to explore more deeply this relationship between 
Coordination and Artificial Intelligence by means of an extension of the Linda 
model providing a matching process with multiple templates. 

The paper is structured as follows. Section 2 shows the differences between 
ruled-based languages and other coordination languages based on pattern match- 
ing. Section 3 describes a formal framework for modelling an extension of the 
tuple-based coordination language Linda. Section 4 presents a description of 
a Web Coordination Service (WCS). We propose an adaptation of the RETE 
algorithm to implement an extension of the matching operation with multiple 
templates. The classical example of the search and reservation of a set of travel 
tickets is used to show the role of a shared dataspace with a rule-based language. 
Finally, conclusions and future work are presented. 



2 Pattern Matching-Based Coordination 

Coordination deals with the management of dependencies between activities and 
the specification of the conditions under which a certain service may o may not 
be invoked. Different coordination models have been proposed in the literature, 
for example Petri nets [5], rules [6], Linda [2], or Statecharts [7]. 

Similarities between these coordination models stem from the use of tem- 
plates to recognize particular states and represent constraints, and therefore the 
use of a pattern matching algorithm to interpret the models. The similarity be- 
tween Petri nets and rules has already been pointed out in the literature (see 
[8,9]), particularly by comparing the process of pattern matching [10,11]. On 
the other hand, Linda is based on a blackboard used as a shared dataspace. Data 
are recovered by means of pattern matching operations. 

In spite of the similarities between these models based on pattern match- 
ing, it is important to note the differences between rule-based languages and 
other coordination languages. Pure coordination languages are used to model 
process creation and coordination, procedures which are concepts orthogonal to 
computational languages. Therefore the difference is clear: a rule-based language 
represents a computational model that expresses the control flow by means of a 
pattern-directed matching algorithm, whereas coordination languages may use a 
matching algorithm to express the order and constraints in which “components” 
are involved in computation. 

Another important difference between rule-based languages and pure coor- 
dination models stems from their different origins. Rule-based languages, as a 
knowledge representation language, put the emphasis on their expressiveness and 
have evolved towards the integration of different paradigms, while pure coordi- 
nation languages propose simple operations independent of any computational 
language. In particular, this difference is reflected on the complexity of their 
corresponding matching operations. The main emphasis in rule-based languages 
has been to express sophisticated patterns to recognize different states, whereas 
in Linda, the main emphasis has been to keep simple operators, and therefore a 
simple matching process. 
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The expressiveness of rule-based languages has inspired a different paradigm, 
the event-condition-action (ECA) paradigm, when they are used to monitor the 
occurrence of events of interest. Once an event is received, an action is performed 
if a condition, i.e., a Boolean predicate over event parameters, is satisfied. The 
main drawback of rule-based languages is that big set of rules are difficult to 
understand and keep. It is important to note that when rule-based languages 
are used, there is not a clear control flow. However, this characteristic makes 
them suitable for defining exception-handling logic. 

As an alternative approach, this paper increases the complexity of the Linda 
matching operation in order to enhance the expressiveness of Linda primitives, 
and therefore opens the possibility to express more sophisticated coordination 
restrictions. This extension is inspired by rule-based languages, but a proper 
extension on the Linda model keeping simple primitives is proposed. From this 
point of view, the extension is similar to High Level Petri nets, which provide 
more compact and manageable descriptions than ordinary Petri nets [12]. 



3 Extending the Linda Model to Web Services 
Coordination 



Linda has been the first coordination model that actually adopted a dataspace 
with associative access for coordination purposes [2], Linda introduces an ab- 
straction for concurrent programming through a small set of coordination oper- 
ations combined with a shared dataspace containing tuples. The out operation 
emits a tuple into the tuple space. The in and rd operation retrieves a tuple. A 
matching rule governs tuple selection from the tuple space in an associative way. 
In [13] a formal framework for tuple-based coordination models is developed; 
the tuple-based coordination medium is represented as a software component 
interacting with the coordinated entities by receiving input events and sending 
output events. The main ingredients of this model are: a set of tuples t rang- 
ing over T; a set of templates tempi ranging over Tempi ; a matching predicate 
mtc(templ, t) between templates and tuples; and a choice predicate p(templ, t, t) 
extracting a tuple t from the multiset of tuples t that matches a template tempi, 
or returning an error element _I_t if no matching is available. This is formally 



i r> i mtc{templ.t) 1 ^ tat mtc(templ,t) 

e ne as ^ ^ fi(tempi,t,j-T ) 

The status of a tuple space at a given time is characterized as a labelled 
transition system by the couple (t, ui) where t is a multiset of tuples and w is a 
multiset of pending queries. A pending query is an input event ie ranging over 
IE C W, where IT is a set of operations such as the reading operation rd, 
the reading and removing operation in, or the writing operation out, which are 
possibly waiting to be served by a tuple space. Output events oe range over OE, 
with the syntax ov representing a message v for entity o. 

The semantics of a tuple space is defined by the couple (S, E). S C W x T 
is a satisfaction predicate for queries, so that (w, t) € S means w is satisfiable 
under the current space’s content t. E £ W i— > 2 tyTxW ' )><OE>< ^ TxW ^ is a evalua- 
tion function, so that (t, w,oe,t , w ’) € E(w) means that the evaluation of the 
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pending query w causes the tuple space in state (t, w) to move to (t , w') and 
produce output event be (or nothing). 

Predicate S and evaluation function E are defined as the least relation sat- 
isfying a set of rules for every pending query corresponding to each primitive 
operation allowed. For instance, in the case of the read operation rd(templ)° , 



mtc(templ,t ) 



must be satisfied. 



i mtc(templ,t) i 

6 rU 6S g( rc i(t erri pi')o j |t) (t\t ,w ,ot ,t\t ,w ,) £E(rd(templ)°) 

In order to introduce Linda into a web-based environment, we work with a 
version of Linda where tuples and templates admit descriptions as sequences 
of attribute/value pairs, which is similar to the “top level” structure of any 
XML document. Now, in this XML-like context, Linda admits a more complex 
matching mechanism providing a promising way to coordinate, communicate and 
collaborate on the net [14]. 

Furthermore, we extend the Linda model provided in [13] by adding opera- 
tions with multiple templates where two or more tuples are involved in, rd< 2 , iri 2 , 
rdi-ini, rd, 2 -in\ , and so on. The aim is to enrich Linda with some transactional 
capabilities (see an example in Subsection 4.3). For instance, the double tuple 
reading operation rd, 2 (templi, temp^) 0 , which is sent by the coordinated entity 
o that wants to read two tuples from the tuple space so that both tuples match 
both templates and share some common attributes. In this case, we need an 
extended matching predicate mtC 2 between two templates and two tuples. The 
predicate is defined as: 

mtc2{templi,t\,templ2,t2) = mtc{templi,ti) A mtc(temp2, ig) 
under the constraint: 



select(templi,ti, a) = select(templ 2 ,t- 2 , cl) Va € sharing(templ\,templ 2 ) 

where sharing : Tempi x Tempi — > List(A) is a list of common attributes and 
select : Tempi x T x A — > V gives the value v of the attribute a of template 
tempi provided by a tuple t € T. 

Also, we can consider a predicate p 2 {templi,templ 2 ,t,ti\t 2 ) by choosing two 
tuples ti \t 2 which match two templates, templi and templ 2 , from a tuple multiset 
t defined as (with _L^ = {Tt, J-t}): 

mtc2{templi,ti : tcmpl2, ig) 

H2(templi,templ2,ti\t2\i,h\t2) 

$ t\ € t or $ t -2 € t mtc 2 (templi,ti,t em Ph,t 2 ) 
p,2{templ\,t em Ph,t, l3p) 

Semantics of rd -2 is described by the following rules where S is the satisfaction 
predicate and E is the evaluation function: 

mtc2{templi,ti : tcmpl2, ig) 

S(rd2(templi,ternpl2)°, ti\t2\t) 

mtc2(templi, ti, templ2, ig) 

(ti\t2\t,W,oti\ot2Ai\t2\t,w) £ E(rd2{templi,tcmpl2)°) 
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note that we should introduce also operations to read n + m tuples and only 
remove the last to’ s: rd n .in m (templ i, . . . , templ n , templ n+ i, . . . , templ n+m )° 

4 Design and Implementation 

As it has been mentioned, it is necessary to make Linda primitives accessible 
through Internet protocols (HTTP, SMTP, SOAP), encode data in XML format, 
and integrate reactive behavior in order to introduce Liuda into a web-based 
environment. In this Section, the design and implementation of a Web Coordi- 
nation Service (WCS) based on the Linda model is presented. It plays the role 
of a broker of messages into the development of a middleware for web-based 
environments. 

The implementation is based on a implementation of Linda called JavaSpaces 
Technology [15]. In JavaSpaces, a collection of processes may cooperate through 
the flow of Java objects (representing tuples) in a network-accessible shared 
space. Besides, the Jini Distributed Event model is incorporated into JavaSpaces 
for firing events when objects that match templates are written into a space. This 
allows to react to the arrival of input events when they are placed in the space. 

First, Subsection 4.1 shows implementation details to provide Linda primi- 
tives through web protocols and extend the simple JavaSpaces matching opera- 
tion to XML data. A more detailed explanation of this implementation may be 
found in [16, 17]. Then, in Subsection 4.2, a structure inspired by the RETE algo- 
rithm is proposed to support the extended matching based on multiple templates 
similar to rule-based languages presented in Section 3. 

4.1 Extending JavaSpaces to Support XML Tuples, Web Protocols 
and Reactive Behavior 

The WCS is composed of three software components (see Figure 2): 

The Web Coordination component operates as a simple web-accessible 
interface of the Java Coordination component, providing the same collection of 
operations than JavaSpaces through web protocols (HTTP, SMTP, SOAP). 

The Java Coordination component is the core of the service. It provides 
the collection of writing and reading operations proposed by Linda and allows 
processes to publish XML tuples and subscribe their interest in receiving XML 
tuples of a specific XML schema, encouraging a reactive style of cooperation 
between processes. This component is a repository of agents (here, an agent is a 
computational entity that acts on behalf of other entities in a semi-autonomous 
way and performs its tasks cooperating with other agents) that are capable of 
coordinating with other agents by exchanging XML tuples through XML-based 
spaces (more details in [17]). 

Finally, the interactions between the web service and the applications oc- 
cur in the XML-based Space component. XML data representing tuples are 
stored in JavaSpaces as Java objects. JavaSpaces provides a shared repository of 
Java objects and operations to read and take objects from and write them to it 
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(they implement the rd, in and out Linda operations). But the matching rules 
of JavaSpaces must be extended to support this XML-based interoperability. 

To solve this problem, the matching process is performed in two steps. In a 
first step, a matching based on XML-Sclrema (type of tuple), is performed by the 
original JavaSpaces matching primitive. In a second step, a particular matching 
method of the retrieved object that represents the same XML-Schema is invoked 
using the original template as a real parameter. This method checks if the field- 
values of the nodes are the same. Other approaches to enhance the matching 
process follow the same strategy. For example, in [18] to exchange encrypted 
data among distributed processes. 

Note that this matching proposal in two steps does not guarantee the se- 
mantics of the Linda model. To solve this semantic problem, all the tuples with 
the same XML-Schema are stored in the same partition. A new Java object 
called Channel has been designed to manage partitions. It must guarantee a 
non-deterministic access to all the tuples stored in a given partition. Figure 2 
shows an example of tuple-space partitioned by three different Channels. In [17] 
the efficiency of the proposed pattern is evaluated, and an additional partitioning 
of the interaction space to improve efficiency is proposed. 

4.2 The Role of the RETE Algorithm 

This Section presents the extension of the matching process to multiple tem- 
plates formally introduced in Section 3 in a way similar to rule-based languages. 
This operation is the main source of inefficiencies in pattern-directed inference 
engines, and algorithms such as RETE [19] and TREAT [20] have been proposed 
to improve the efficiency of this process. 

The underlying aim of these algorithms is to improve efficiency by match- 
ing only the changed data elements against the rules rather than repeatedly 
matching all the templates against all the data. With this purpose in mind, the 
information about previous matchings is recorded in a graph structure. 

For example, the RETE network is composed of a global test tree common 
to all the rules, and a join tree specific to each rule. The test tree is composed of 
one-input nodes (called a-memories), each of them representing a test over some 
attribute. The information related to the binding consistency test is represented 
by the join tree associated to the rule. The join tree is composed of two-input 
nodes (called join nodes or /3-memories). Join nodes collect data pointers for a 
consistent binding of template variables. The last join node collects all the data 
elements to which the associated rule can be applied. Figure 1 shows the RETE 
network resulting from the compilation of a sample rule in CLIPS syntax. 

When a datum is added to the working memory, a new pointer is introduced 
into the root of the test tree and propagated if the test is successful. 

In our case, there are two reasons to recover the RETE structure in or- 
der to support our extended matching: 1) to provide an efficient algorithm 
to support the proposed extended matching; and 2) to use the RETE struc- 
ture as a base to support the semantics of the extended matching operation 
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Fig. 1. A sample of a RETE Network 

with several templates. To this end, the last join nodes associated with each 
reading operation with several templates collect all the data that satisfy the 
templates, and also the blocked processes waiting for data (tuples) that match 
these templates. 

When a reading operation with several templates is performed, an Q-memory 
is created for each template (see Figure 2). Each Q-memory is subscribed to 
the corresponding Channel (XML-Schema) and filters the data that match the 
template. Intermediary /3-memories collect data pointers for a consistent binding 
of a template; and the last /3-memory collects all the possible combinations of 
tuples in the XML Space that satisfy the operation. An alternative approach is 
to save only the last /3-memory as in the TREAT algorithm. 

If there is a set of tuples that matches the reading template, it will be found 
and propagated until the corresponding /3-memory; otherwise, the reading pro- 
cess will be blocked. Reading processes are blocked until a set of tuples reaches 
the last /3-memory. Then, the reading of all the tuples must be performed as an 
atomic action, i.e. through a rd n operation, because a concurrent operation may 
remove some of the tuples that match the templates. 

The JavaSpaces Technology uses the Jini Transaction model, which provides 
a generic transaction service for distributed computing. To perform all the read- 
ing operations as atomic actions, all of them are performed within the same 
transaction context. If all the reading operations are successful, the waiting pro- 
cess will be unblocked. Otherwise, if a reading operations fails, the transaction 
will be automatically aborted and the waiting processes will be blocked until a 
new set of tuples reaches the last /3-memory. 

As it has been pointed out in Section 3, a semantic aspect must be considered 
when various templates are used in an atomic action and only a subset of tuples 
matching them must be removed (i.e. through a rd n in m operation). Then, it 
is necessary to mark the set of templates that imply the removal of the tuples 
that match them. This templates will be translate into the corresponding take 
operations, and the rest into read operations over JavaSpaces. The use of the 
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Fig. 2. Use of a RETE Network to support the extended matching 

support of JavaSpaces for transactions prevents any inconsistency. Once a tuple 
is read under a transaction, it cannot be involved in a new transaction until the 
first transaction is completed. 

Finally, it is important to note the difference between our approach and the 
classical use of the RETE algorithm. In classical rule-based languages, rules are 
defined and compiled to generate the RETE network from the beginning and 
the structure remains immutable. In our proposal, each reading operation implies 
building a temporal structure that will be removed. The structure only remains 
if a subscription operation is performed. On the other hand, the main reason for 
the original RETE network is its efficiency in the pattern matching process. In 
our proposal, the network and its Channels provide a way to filter tuples and 
perform a test of binding consistency. 

4.3 Example of the Use of Extended Matching 

A classical example is the search and reservation of a set of tickets to travel from 
an origin to a destination. Let’s assume that airlines introduce in a Web Co- 
ordination Service free seats to be booked, whereas railway companies publish 
timetable information. If a client is looking for train or fly tickets with inter- 
mediary stopovers tickets should be removed and train combinations should be 
read. However, conversations (protocols) between clients and companies in order 
to reserve and pay reservations is out of the scope of this paper. The following 
multiple template reading operation represents the constraints imposed: 

read-C* [(type fly) (from, London) (to , ?FirstPort) 

(depart ?depart&After (?depart 8:30 0:00)) 

(arrive TarriveFirstPort) (ticket_number ?tickectf irstflyl)] 
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*[(type fly) (from, ?FirstPort) (to, Madrid) 

(depart Tdepart&After (Tdepart ?arriveFirstPort 2:00)) 

(arrive TarriveSecondPort) (ticket_number ?tickectf irstf ly2) ] 
[(type train) (from, Madrid) (to, Zaragoza) 

(depart Tdepart&After (?depart TarriveSecondPort 1:00)) 

(ticket .number Ttickectf irstf ly2)] 

> 

The asterisk before the template denotes that the tuple should be removed. 
In our implementation, a tuple that must be removed will be translated into a 
JavaSpaces take operation. Otherwise, it will be translated into a read oper- 
ation. And all the JavaSpaces operations involved in a reading operation with 
several templates will be performed within the same transactional context. 

5 Conclusions and Further Work 

In this paper an extension of the Linda model aimed at improving its usability 
in real applications of web services coordination has been presented. In a rather 
unexpected way, it has been also found a parallelism between generative commu- 
nication and rule-based systems. A well-known technique taken from Artificial 
Intelligence, namely the RETE algorithm, can be used to obtain an efficient 
implementation of our theoretical proposal. 

In order to introduce our ideas, a formal approach has been adopted. The 
relevance of this formal model, which is only partially sketched here, is twofold: 
on the one hand, it facilitates a proper extension of the Linda model, and not 
only a more o less fuzzy variant of it (this goal is relevant due to the existence 
of technological tools based on Linda as JavaSpaces); on the other hand, the 
formal model is simple enough to devise cleaner algorithm (in fact, the role of 
the RETE algorithm was discovered in this way). 

Apart from the attribute/ value, or more generally XML enhancement (which 
was previously introduced elsewhere [14]), our main contribution is the transac- 
tional capabilities supplied to the Linda model. Even if limited, these capabilities 
can be implemented on JavaSpaces and be used in non-trivial real applications, 
as shown in Subsections 4.2 and 4.3, respectively. 

Further work should be done in order to fully implement our extended Linda 
model by including our RETE-like algorithm. In addition, more studies are 
needed in order to link the general field of web services coordination with ad- 
vanced blackboards architectures and other related Artificial Intelligence areas. 
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Abstract. Early work on Case Based Reasoning reported in the literature shows 
the importance of soft computing techniques applied to different stages of the 
classical 4-step CBR life cycle. This paper proposes a reduction technique 
based on Rough Sets theory that is able to minimize the case base by analyzing 
the contribution of each feature. Inspired by the application of the minimum de- 
scription length principle, the method uses the granularity of the original data to 
compute the relevance of each attribute. The rough feature weighting and selec- 
tion method is applied as a pre-processing step previous to the generation of a 
fuzzy rule base that can be employed in the revision phase of a CBR system. 
Experiments using real oceanographic data show that the proposed reduction 
method maintains the accuracy of the employed fuzzy rules, while reducing the 
computational effort needed in its generation and increasing the explanatory 
strength of the fuzzy rules. 



1 Introduction and Motivation 

Case-Based Reasoning (CBR) systems solve problems by reusing the solutions to 
similar problems stored as cases in a case base ( 1 ] (also known as memory). How- 
ever, these systems are sensitive to the cases present in the case memory and often 
its accuracy depend on the significance of these stored cases. Therefore, in CBR 
systems it is always important to reduce noisy cases in order to achieve a good 
generalisation accuracy. 

Case Based Reasoning is an application area where soft computing tools have had a 
significant impact during the past decade [2], These soft computing techniques (fuzzy 
logic, artificial neural networks, genetic algorithms and rough sets, mainly) work in 
parallel for enhancing the problem-solving ability of each other [3]. 

In this paper we propose a reduction technique based on Rough Set theory applied 
to the revision stage of CBR systems. The proposed method was introduced into our 
Case-Based forecasting platform called FSfRT, specialized in making predictions for 
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changing environments. Presently, the FSfRT platform is able to combine artificial 
neural networks and fuzzy logic in the framework of a CBR system. 

The paper is structured as follows: section 2 introduces the Rough Set theory 
grounding; section 3 details the proposed Rough Set reduction technique; section 4 
describes the Case-Based Reasoning platform used in this study; section 5 exposes the 
test bed of the experiments and the results obtained; finally, section 6 presents the 
conclusions and further work. 



2 Rough Set Theory 

Rough Set theory, proposed by Pawlak [4,5], is an attempt to provide a formal frame- 
work for the automated transformation of data into knowledge. It is based on the idea 
that any inexact concept (for example, a class label) can be approximated from below 
and from above using an indiscernibility relationship. Pawlak [6] points out that one 
of the most important and fundamental notions to the Rough Set philosophy is the 
need to discover redundancy and dependencies between features. 

The main advantages of Rough Set theory are that it: (i) provides efficient algo- 
rithms for discovering hidden patterns in data; (ii) identifies relationships that would 
not be found while using statistical methods; (iii) allows the use of both qualitative 
and quantitative data; (iv) finds the minimal sets of data that can be used for classifica- 
tion tasks; (v) evaluates the significance of data and (vi) generates sets of decision 
rules from data. 

2.1 Basic Concepts and Definition 

Briefly, the relevant Rough Set terminology is stated below. An information system is 
a pair S = (U, A), where U is a non-empty and finite set, called the universe, and A is a 
non-empty, finite set of attributes (or features). An equivalence relation, referred to as 
indiscernibility relation, is associated with every subset of attributes PcA. This rela- 
tion is defined as: 

IND(P) = {(x, y)e U xU : for every ae P, a(x ) = a(y)} . (1) 

Given any subset of features P , any concept X <z U can be defined approximately 
by the employment of two sets, called lower and upper approximations. The lower 
approximation, denoted by PX , is the set of objects in U which can be certainty classi- 
fied as elements in the concept X using the set of attributes P, and is defined as fol- 
lows: 



PX = u{YeU / IND(P) TcXJ. (2) 

The upper approximation, denoted by PX , is the set of elements in U that can be 
possibly classified as elements in X, formally: 

PX =u{f£(/ / IND(P):YnX *0} . (3) 
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The degree of dependency of a set of features P on a set of features R is denoted by 
y sCP). 0 < Ji/P) < 1, and is defined as: 



where 



Yr(P) 



card(POS R (P )) 
cardiU) 



(4) 



POS R (P) = U— ^ ' (5) 

XeU IIND(P) 

POS R (P ) contains the objects of U which can be classified as belonging to one of 
the equivalence classes of IND(P), using only features from the set R. If y R (P) = 1, 
then R functionally determines P. 

Various extensions have been defined from the basic model proposed by Pawlak. 
Among these extensions stands out the Variable Precision Rough Set model (VPRS) 
1 7] which is a generalisation that introduces a controlled degree of uncertainty within 
its formalism. This degree is established by an additional parameter cf). 



2.2 Rough Sets as Reduction Technique 

A major feature of the Rough Set theory is to find the minimal sets of data that can be 
used for classification tasks. In this sense, the notions of core and reduct of knowledge 
are fundamental for reducing knowledge preserving information. After stating the 
formal definitions of these concepts, it is outlined the reduction process proposed by 
the methodology. 

P is an independent set of features if there does not exist a strict subset P' of P such 
that IND(P) = IND(P'). A set R c P is a reduct of P if it is independent and IND(R) = 
IND(P). Each reduct has the property that a feature can not be removed from it with- 
out changing the indiscernibility relation. Many reducts for a given set of features P 
may exists. The set of attributes belonging to the intersection of all reducts of P is 
called the core of P: 

core(P) = f> • (6) 

RsReduct(P) 

An attribute a e P is indispensable if IND(P) ^ IND(P \ {«}). The core of P is 
the union of all the indispensable features in P. 

The reduction technique stated by the methodology is specially suitable for re- 
ducing decision tables. A decision table is an information system of the form S = 
{U, A u {<f}), where d <£ A is a distinguished attribute called the decision attribute 
or class attribute. The elements of the set A are referred to as condition attributes. A 
decision table is a classifier that has as its internal structure a table of labelled in- 
stances. Given a novel instance, the classification process is based on the search of 
all matching in-stances in the table. If no matching instances are found, unknown is 
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returned; otherwise, the majority class of the matching instances is returned (there may 
be multiple matching instances with conflicting labels). The indispensable attributes, 
reducts, and core can be similarly defined relative to a decision attribute or output 
feature. The precise definitions of these concepts can be fount in Pawlak's book on 
Rough Sets [5], 

At this point, it is very important to use the classification rules (given by a decision 
table) with the minimal effort, and therefore, the simplification of decision tables is of 
primary importance. The simplification process comprises two fundamental tasks. On 
the one hand, reduction of attributes consists of removing redundant or irrelevant 
attributes, without losing any essential classification information. The computation of 
the reducts for the condition attributes relative to the decision attribute is carried out 
to achieve this goal. On the other hand, reduction of attribute values is related to the 
elimination of the greatest number of condition attribute values, maintaining also the 
classificatory power. 



3 Feature Subset Selection Using Rough Sets 

The computation of the reducts and the core of the condition attributes from a deci- 
sion table is a way of selecting relevant features. It is a global method in the sense 
that the resultant reduct represents the minimal set of features which are necessary 
to maintain the same classificatory power given by the original and complete set of 
attributes. A straighter manner for selecting relevant features is to assign a measure 
of relevance to each attribute and choose the attributes with higher values. 

In the Rough Set framework, the natural way to measure the prediction success is 
the degree of dependency defined above. However, [8] have shown the weakness of 
this measure in order to assess an estimation of the predictive accuracy of a set of 
condition attributes Q with regard to a class attribute d. To overcome this deficien- 
cies, [9] define the notion of rough entropy. Based on this notion and its adaptation 
to the VPRS model (in order to exploit more efficiently the knowledge that is pro- 
vided for the observations in the boundary region or the uncertain area of the uni- 
verse), we have defined a coefficient that allows to asses the significance of an 
attribute within a set of attributes [10]. The significance of an attribute a e Q is 
defined in a way that its value is greater when the removal of this attribute leads to 
a greater diminution of the complexity of the hypothesis Q \ [a], and simultane- 
ously, to a lesser loss of accuracy of the hypothesis. Implicitly, the underlying prin- 
ciple used to evaluate the relevance of an attribute in this way is the Minimum De- 
scription Length Principle (MDLP) [11]. 

The associated complexity of a given set of condition attributes Q can be evalu- 
ated through the entropy of the partition U / IND(Q), which will be denoted by 
H{Q). On the other hand, the conditional rough entropy ll 0 (d I Q) can be used to 
evaluate the accuracy that is achieved when the condition attributes Q are used to 
predict the value of the condition attribute d. Therefore, the formal definition of the 
trough entropy, denoted by Rll 0 (d I Q ), is given by the following expression: 
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RH^Q, d) = H(Q) + H l p(d I 0 = (0 + 

{i-^w)}iog 2 |^i+ y iiog.i . 

I *,=roCw)M MJ (7) 

= {i-^W)}iog 2 |t/|- X M log2 M 

^cPOS Q ^(d) r I r I 

where X, represents each one of the classes of the partition U / IND(Q), the set POSq ,/, 
( d) is the positive region of Q with regard to the decision attribute d , and y Q ^ (d) is the 
degree of dependence of attribute d on the set of attributes Q. 

Then, the ^-significance of a condition attribute, a e Q, with regard to the decision 
attribute d , denoted by (X L @ ( Q , d), is defined as the variation that the 0-rough entropy 
suffers when the considered attribute is dismissed from Q. Namely, it is computed the 
term A d RH<p(Q, d), given by the difference between RH ,,(Q. d) and Rll 0 ( Q \ [a], d). 
Formally, 



<y a 4 (2, d) = A a RH , (Q,d) = RH , (Q, d)- RH $ (Q\{a),d) 

= {H(Q)-H{Q\{a})}-{H^d I Q\{a})-H^d I 0} 

Fig. 1 provides a concise description of the algorithm that selects a subset of rele- 
vant features using the significant 0-rough coefficient to evaluate the relevance of a 
feature. The proposed algorithm for selecting relevant features is described according 
to the view proposed by [12]. These authors state that a convenient paradigm for 
viewing feature selection methods is that of heuristic search, with each state in the 
search space specifying a subset of the possible features. Following Blum and Langley 
viewpoint the four basic issues that characterise this method are: 

- The starting point in the space, which in turn influences the direction of search and 
the operators used to generate successor states. The proposed algorithm starts with 
all attributes and successively removes them (lines 1 and 15, respectively). This 
approach is known as backward elimination. 

- The organisation of the search. Any realistic approach relies on a greedy method to 
traverse the space considering that an exhaustive search is impractical. At each 
point in the search, the proposed algorithm considers all local changes, namely, it 
evaluates the significance of each attribute of the current set of attributes (loop for). 

- The strategy used to evaluate alternative subsets of attributes. In this paper, the 
variation of the normalised 0-rough entropy has been chosen for this purpose. Spe- 
cifically, at each decision point the next selected state is that one which results from 
removing the attribute with the least significant 0-rough coefficient (line 10). 

- A criterion for halting the search. In the algorithm, the criterion for halting is that 
the difference between the degree of dependency at initial state and the current one 
(both with respect to the decision) do not exceed a predefined threshold (line 14). 
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Function FEATURE_SUBSET_SELECTION [input: decisionTable, cf.; output: outputFeatures) 


00 


i 

begin. 




01 


outputFeatures <- Features(decisionTable) f* the starting point is the complete set of input features 7 


02 


Yinitial *— 


■ CoefDependence(decisionTable, outputFeatures, TargetFeature(decisionTable), <f>) 


03 


step <- 


- 0; haltCriterion <- FALSE 


04 


while 


-ihaltCriterion do /* while it is not satisfied criterion of halt, remove features 7 


05 




haltCriterion <- TRUE; step <- step + 1 


06 




<5max<F- 1.0; 


07 




for each feature f e outputFeatures do /* select the most irrelevant feature 7 


08 




Of <— CoefSignificance(decisionTable, f, outputFeatures, TargetFeature(decisionTable), <J>) 


09 




y •€— CoefDependence( decisionTable, outputFeatures \ {f}, TargetFeature(decisionTable), $) 


10 




if ( Of < Umax ) then 


11 




Ycurrent ^ — Y 


12 




feature_to_remove f*. f 


13 




Ythreshoid «- YinitiaFt 1 -0 - DecrementFactor(step)); 


14 




if (Ycurrent ■ >= Yttreshold ) then 


15 




outputFeatures 4~ : outputFeatures \ {feature_to_remove} /* remove feature 7 


16 




haltCriterion FALSE 


17 


end 

} 





Fig. 1. Algorithm for feature subset selection 



4 Description of the FSfRT Platform 

The study described in this paper was carried out in the context of the FSfRT plat- 
form. FSfRT is a structured hybrid system that can employ several soft computing 
techniques in order to accomplish the 4-steps of the classical CBR life cycle [1]. This 
section covers two main points: (i) details the architecture of the FSfRT platform and 
(ii) introduces the use of Rough Sets inside the whole system. 

4.1 FSfRT Platform Architecture 

The FSfRT platform is an extension of a previous successful system [13] able to make 
predictions of red tides (discolourations caused by dense concentrations of micro- 
scopic sea plants, known as phytoplankton). The FSfRT platform allows us to com- 
bine several soft computing techniques in order to test their suitability working to- 
gether to solve complex problems. The core and the interfaces of FSfRT have been 
coded in Java language and new capabilities are being developed. The general idea is 
to have different programmed techniques able to work separately and independently in 
co-operation with the rest. The main goal is to obtain a general structure that could 
change dynamically depending on the type of problem. Fig. 2 shows a schematic view 
of the system. 

On the left of Fig. 2, it is shown the core of the platform that is composed by a 
KAM ( Knowledge Acquisition Module). The KAM is able to store all the information 
needed by the different techniques employed in the construction of a final CBR sys- 
tem. In the retrieve and reuse stages, several soft computing techniques can be used 
[2,3], while in the revise stage, our platform employs a set of TSK fuzzy systems [ 14] 
in order to perform the validation of the initial solution proposed by the system. 
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Fig. 2. FSfRT platform architecture 

Our aim in this work, is to perform a feature subset selection step before the induc- 
tion process carried out by the fuzzy revision method. The aim of this proposal is to 
reduce the original set of attributes and therefore decrease the computational effort for 
the generation of the different fuzzy models. 



4.2 Rough Sets Inside the FSfRT Platform 

Fig. 3 shows the meta-level process when incorporating the Rough Sets as a pre- 
processing step before the generation of the fuzzy revision subsystem. 




Fig. 3. Rough Set pre-processing step 



For details related to the construction of the fuzzy systems starting from a Radial 
Basis Function neural network see [14], The Rough Set process described here 
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involves the first step in the generation of the initial fuzzy system and it is divided into 
three phases: 

The first one discretises the cases stored in the case base. It is necessary in order to 
find the most relevant information using the Rough Set theory. The second one uses 
the significant 0-rough coefficient to select a subset of relevant features as described 
in section 3 (see Fig. 1). Finally, the last phase searches for reducts and core of 
knowledge from the features selected in the previous phase, as explained in section 2. 

The motivation of including the second phase is that the computation of reducts is a 
blind technique, where several combinations of a sufficient number of irrelevant fea- 
tures can become a reduct. The pre-selection of features leads to reducts with a lesser 
complexity and a higher predictive accuracy. 

5 Empirical Study 

In order to evaluate the proposed method, we use a biological database composed by 
several physical variables (temperature, PFI, oxygen, salinity, etc.) measured at dis- 
tinct depths and belonging to different monitoring points of the north west coast of the 
Iberian Peninsula. These data values are complemented with data derived from satel- 
lite images stored separately. The satellite image data values are used to generate 
cloud and superficial temperature indexes. The whole memory of the system consists 
on approximately 6300 cases, each one represented as a feature vector that holds 56 
attributes. 

The FSfRT platform was configured to use the same techniques as in our previous 
work [14], where the fuzzy revision method was successfully tested: (i) a Growing 
Cell Structure (GCS) neural network as retrieval method, (ii) a Radial Basis Function 
(RBF) neural network for the reuse step and (iii) the aforementioned set of TSK fuzzy 
systems working as the revision mechanism. Specific information about these tech- 
niques and its integration inside the CBR life cycle can be found in [15]. The main 
goal of the previous work was to develop a forecasting biological system capable of 
predicting the concentration of diatoms (a type of single-celled algae) in different 
water masses. 

Although the experiments carried out in [14] showed the effectiveness and the 
straightforward improvement of the proposed fuzzy revision method over other ap- 
proaches, some issues remained unsolved in order to deploy the application for real 
use. Concisely, the main drawbacks of the tested method were: ( i ) the time needed for 
generating each one of the TSK fuzzy models and (ii) the explanatory complexity of 
the fuzzy rules used for the final solution proposed by the system. 

In order to solve these problems maintaining at the same time the accuracy level, 
we have proposed in this paper a feature subset selection algorithm based on Rough 
Set theory. As we can see in Table 1, several 0 values have been tested in order to 
obtain the most accurate set of representative features defining each problem case. For 
the current domain of diatoms forecasting, the optimal number of features was 12 (0 
= 0.01), corresponding to the physical magnitudes measured with a smaller level of 
depth and those generated from satellite images. 

A crucial aspect in this experiment is the accuracy level of the Rough Set based 
revision subsystem and its comparison with the original one. Starting from the error 
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Table 1. Selected features depending on the value of the parameter (f> 



parameter^ 


number of selected features 


0.0 


16 


0.01 


12 


0.025 


10 


0.05 


8 


0.1 


8 



series generated by the different models, the Kruskall- Wallis test has been carried out. 
Since the P-value is less than 0.01, there is a statistically significant difference among 
the models at the 99.0% confidence level. Fig. 4 shows a multiple comparison proce- 
dure (Mann-Withney) used to determine which models are significantly different from 
the others. The experiments were made with a data set of 448 cases randomly taken 
from the case base. It can be seen that the CBR with TSK fuzzy revision subsystem 
(CBR TSK) presents statistically significant differences with the rest of the models, 
whilst it is as accurate as the simplified method presented here (CBR (f) (TSK)). 
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Fig. 4. Mann-Withney test carried out between each pair of models 



The time consumed in the execution of the pre-processing step plus the whole gen- 
eration of the TSK fuzzy systems (2 hours aprox. in a Pentium IV processor) was the 
80% less than the amount needed for generating the original fuzzy revision subsystem. 
This timesaving operation is motivated by the simplified fuzzy rule base employed by 
the greedy algorithm used to generate each one of the TSK fuzzy systems. 

Another relevant circumstance derived from the adoption of the proposed schema 
was the increment in the explanatory strength of the justification generated by the final 
CBR system. Initially the feature vector describing a problem was composed of 56 
attributes, the same as the fuzzy rule antecedents, now the system is able to produce an 
explanation based on only 12 main features with the same level of accuracy. 



6 Conclusion and Further Work 



This paper introduces a new reduction technique based on Rough Set theory that can 
be applied for improving a previous successful method that automates the revision 
stage of CBR systems. 
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Empirical studies show that this reduction technique allow us to obtain a more gen- 
eral knowledge of the model and gain a deeper insight into the logical structure of the 
system to be approximated. Employing the simplified fuzzy rule base as the starting 
point to generate the fuzzy revision subsystem proposed in [14], leads to a dramatic 
decrease of the time needed for this task while maintaining an equivalent generalisa- 
tion accuracy. 

These benefits are augmented with the simplicity of the new fuzzy rules used by the 
CBR system as explanation of the final adopted solution. In this way, it is interesting 
the definition of a formal measure in order to rate and compare the explanation 
strength of these fuzzy rules. 

Due to the suitability showed by the Rough Set theory working together with other 
soft computing techniques, we are also interested in the development of new ways to 
put together this formalism with the existing techniques coded in the FSfRT platform. 



References 

1. Riesbeck, C.K., Schank, R.C.: Inside Case-Based Reasoning, Lawrence Erlbaum Associ- 
ates, Hillsdale, NJ, US (1999) 

2. Pal, S.K., Dilon, T.S., Yeung, D.S.: Soft Computing in Case Based Reasoning, Springer 
Verlag, London (2000) 

3. Sankar, K.P., Simon, C.K.S: Foundations of Soft Case-Based Reasoning, Wiley- 
Interscience, Hoboken, New Jersey (2003) 

4. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences, Vol. 
11.(1982) 341-356 

5. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic 
Publishers, Dordrecht (1991) 

6. Pawlak, Z.: Rough sets: present state and the future. Foundations of Computing and Deci- 
sion Sciences, Vol. 1 1 (3-4). (1993) 157-166 

7. Ziarko, W.: Variable Precision Rough Set Model. Journal of Computer and System Sci- 
ences, Vol. 46. (1993) 39-59 

8. Diintsch, I., Gediga, G.: Statistical evaluation of rough set dependency analysis. Interna- 
tional Journal of Human-Computer Studies, Vol. 46. (1997) 589-604 

9. Diintsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artificial Intelli- 
gence, Vol. 106. (1998) 77-107 

10. Diaz, F., Corchado, J.M.: A method based on the Rough Set theory and the MDL principle 
to select relevant features. Proc. of the X CAEPIA - V TTIA, Vol. 1. (2003) 101-104 

11. Rissanen, J.: Minimum description length principle. In Kotz, S. and Johnson, N. L. (eds.). 
Encyclopedia of Statistical Sciences. John Wiley and Sons, New York (1985) 523-527 

12. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. 
Artificial Intelligence, Vol. 97. (1997) 245-271 

13. Fdez-Riverola, F.. Corchado, J.M.: FSfRT, Forecasting System for Red Tides. An Hybrid 
Autonomous AI Model. Applied Artificial Intelligence, Vol. 17 (10). (2003) 955-982 

14. Fdez-Riverola, F.. Corchado, J.M.: An automated CBR Revision method based on a set of 
P-TSK Fuzzy models. Proc. of the X CAEPIA - V TTIA, Vol. 1. (2003) 395-404 

15. Fdez-Riverola, F.: Neuro-symbolic model for unsupervised forecasting of changing envi- 
ronments. Ph.D. diss.. Dept, of Computer Science, Vigo University, Spain (2002) 




Dynamic Case Base Maintenance 
for a Case-Based Reasoning System 



Maria Salamo and Elisabet Golobardes 



Enginyeria i Arquitectura La Salle, Universitat Ramon Llull, 
Quatre Camins 2, 08022 Barcelona, Catalonia, Spain 

{mariasal , elisabet }@salleurl . edu 



Abstract. The success of a case-based reasoning system depends crit- 
ically on the relevance of the case base. Much current CBR research 
focuses on how to compact and refine the contents of a case base at two 
stages, acquisition or learning, along the problem solving process. Al- 
though the two stages are closely related, there is few research on using 
strategies at both stages at the same time. This paper presents a model 
that allows to update itself dynamically taking information from the 
learning process. Different policies has been applied to test the model. 
Several experiments show its effectiveness in different domains from the 
UCI repository. 



1 Introduction 

Learning is a process in which an organized representation of experience is con- 
structed [Scott, 1983]. However, this experience cause two problems in Case- 
Based Reasoning (CBR) systems, as reported in recent years. The first one is 
the swamping problem which relates to the expense of searching large case-bases 
for appropriate cases with which to solve the current problem. The second one 
is that the experience can be harmful and may degrade the system performance 
(understanding performance as problem solving efficiency). 

Research on the area highlights to deal with negative knowledge using dif- 
ferent strategies. Negative Knowledge is correct knowledge that can be a source 
of unsuccessful performance [Markovitclr, S. and Scott, P.D., 1988]. Minton has 
demonstrated by selective discarding knowledge in a system [Minton, 1985] that 
the performance can be improved. Usually, the strategy of avoiding negative 
knowledge in the initial case base is not enough to achieve maximum perfor- 
mance for a CBR system. It is usually also necessary to integrate into the 
system a repeated maintenance during the problem solving process. There are 
several methods that fulfill these requirements, like competence-preserving dele- 
tion [Smyth and Keane, 1995], failure-driven deletion [Portinale et al., 1999], as 
well as for generating compact case memories [Smyth and Mckenna, 2001]. More 
close to our proposal are the one that examines the benefits of using fine-grained 
performance metrics to guide case addition or deletion [Leake and Wilson, 2000]. 

Previously to this paper, we have presented different approaches to case base 
maintenance [Salamo and Golobardes, 2003] in acquisition stage that allow us 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 93—103, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




94 



M. Salamo and E. Golobardes 



to reduce the case base in a controlled way and, at the same time, maintain the 
efficiency in the CBR system. Although our objectives had been achieved, our 
previous conclusions and the research on the area move us to go deeply into an 
extended treatment of the case base. 

This paper introduces a dynamic case base maintenance (DCBM) model that 
updates the knowledge (case base in CBR) based on the learning problem solv- 
ing process. The knowledge update is based on Reinforcement Learning. This 
approach can be considered as a ’’wrapper” model to case base maintenance. 
However, the authors propose it as a dynamic model because it depends com- 
pletely on the problem solving process of the CBR system. 

The paper is organized as follows. In section 2 we introduce the dynamic 
case base maintenance model and then different policies to apply it. Section 3 
details the fundamentals of our experiments. Next section shows and analyzes 
the effectiveness of the model with the experimental results. Finally, we present 
the conclusions and further work. 



2 Dynamic Case Base Maintenance 

The foundation of our Dynamic Case Base Maintenance (DCBM) proposal is 
Reinforcement Learning. So, first we summarize its basis. Next, we describe how 
to use the Reinforcement Learning in our system, how the coverage of a CBR 
system can be modelled, and how different policies can exploit this model to 
perform a dynamic experience update able to control and optimize the case base 
while solving new cases. 

2.1 Reinforcement Learning 

Reinforcement Learning (RL) [Sutton and Barto, 1998] combines the fields 
of dynamic programming and supervised learning to yield powerful machine- 
learning systems. Reinforcement Learning appeals to many researchers because 
of its generality. 

Reinforcement Learning [Harmon, 1996] is an approach to learning by trial 
and error in order to achieve a goal. A RL algorithm does not use a set of in- 
stances which show the desired input/output response, as do supervised learning 
techniques. Instead, a reward given by the environment is required. This reward 
evaluates the current state of the environment. The Reinforcement Learning 
Problem (RLP) consists of maximizing the sum of future rewards. The goal to 
be accomplished by RL is encoded in the received reward. To solve the problem, 
a RL algorithm acts over the environment in order to yield maximum rewards. 
Any algorithm able to solve the RLP is considered a RL algorithm. 

Reinforcement Learning theory is usually based on Finite Markov Decision 
Processes (FMDP). The use of FMDP allows a mathematical formulation of the 
RLP, therefore, the suitability of RL algorithms can be mathematically proved. 

Several elements appear in all RLPs. In each iteration the RL algorithm 
observes the state St of the environment and receives the reward rt . The reward 
is a scalar value generated by the reinforcement function which evaluates the 
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current state and/or the last executed action according to RLP. Following the 
rules of the RL algorithm, it generates an action a t . The environment reacts to 
the action changing state s t and generating a new state s t+ i . The value function 
contains the expected sum of future rewards. This function is used and modified 
by the RL algorithm to learn the policy function. A policy function indicates the 
action to be taken at each moment. 

Initially, the approximation of the optimal value function is poor. Therefore, 
it is necessary to approximate the value function at every iteration. There are 
several methods that can be applied. 

In order to find the optimal value functions, the Bellman equation is applied: 
V*(X t ) = r(X t ) + r )V*(X t+ i) , where V*(X t ) is the optimal value function; X t 
is the state vector at time f; X t+ i is the state factor vector at time t + 1; r(X t ) 
is the reinforcement function and 7 is the discount factor in the range [0,1]. 

2.2 Dynamic Case Base Maintenance Model 

There are several methodologies to solve the RLP formulated as a FMDP: dy- 
namic programming, temporal difference algorithms and monte-carlo methods. 
We will use a Monte-Carlo method because is the only one that use experience 
of the environment to learn the value functions. 

The question that arises now is how this idea can be applied to our model. 
Lets consider the model by analogy of the elements described in section 2.1. For 
our purpose a state St is a case of the environment that receives a reward rt- 
The reward is a value generated by the reinforcement function which evaluates 
if the current state classifies or not classifies correctly. In our model the rein- 
forcement function is the revise phase of the CBR cycle. Following the rules of 
the RL algorithm, which includes the case base maintenance policy, it generates 
an action at- The action for us is to delete or to maintain a case from the case 
base. The environment is the CBR cycle. The environment reacts to the action 
changing to state s t+ i, if the action is to delete the case. Thus, reducing the 
case base. The environment also generates a new reward after the problem solv- 
ing process which has used the possibly reduced case base. The value function 
contains the expected sum of future rewards. This function is used and modified 
by the RL algorithm to learn the optimal case base. We test two different policy 
functions. Figure 1 shows the description of all the process. In our case, the RL 
algorithm receives a set of states and a reward for each one, and returns to the 
environment a set of actions. 

Definition 1 (Coverage). 

Let T = {ti, t 2 , •••, t n } be a set of training cases, V ti £ T: Coveragek{U) will be 
the value of the metric used by the case base maintenance method at iteration k. 

The coverage is the goodness value of a case when it is used to solve a target 
problem. It can be defined in several ways depending on the case base mainte- 
nance techniques used. For instance, it can be defined [Smyth and Keane, 1995] 
as the set of target problems that it can be used to solve. Here, we modify slightly 
the definition in order to adapt it to our model. The coverage is defined as the 
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initial sum of future rewards using a Rough Sets measure. That is, Coveragek(ti ) 
is the value function at iteration k for state fj. 



state s. 



reward r t 



RL algorithm 



S3 


|Tt+i 




s t+i ^ 



Environment (CBR) 



actions a t 



Fig. 1 . Relation between RL algorithm and the environment 



As detailed previously, the most important part of the RL algorithm is to 
update the value function. We use a Monte-Carlo (MC) which interacts with 
the environment following a particular policy function. In our model it is the 
optimizer of the case base. When the episode finishes, the MC algorithm updates 
the value of all visited states based on the received rewards. The visited states for 
a CBR cycle will be the fcNN cases retrieved to solve the new problem. Equation 
1 shows the general update rule to estimate the state-value function. Our MC 
algorithm is detailed in definition 2. 

Definition 2 (CoverageUpdate). 

Let T = be a set of ATNN cases, V f, £ T: 

Coveragek+i{ti) <— Coveragekiti) + a ■ \ Rt — Coveragek{ti)\ (1) 

It can be observed that the current prediction of the state- value Coveragek{ti) 
is modified according to the received sum of rewards Rt. The Rt value is 1.0 if the 
state ti solve the target problem, otherwise it is 0. There is also a learning rate 
a which averages the values obtained in different episodes. The learning rate is 
usually set up to value 0.1 or 0.2 in RL systems. If the states are updated quite 
often it is set up to value 0.1, otherwise to 0.2. The selection of J\NN neighbors 
in a CBR cycle may not often be repeated, so we have set up this learning rate 
to 0.2 in order to accelerate the differences of Coverage in few iterations. 

Once described our value function update, we describe entirely the dynamic 
case base maintenance (DCBM) model in algorithm 1, which shows that the 
retrieval phase selects K Nearest-Neighbors, although it uses the best neighbor 
to solve the new problem. We consider the selection of A"NN in order to accelerate 
the maintenance process of the case base. Another important point is the relation 
of the retain stage with the RL algorithm (step 9 and 10) in algorithm 1. The 
retain phase receives the set of actions to improve the case base. 

The most notable aspect of the dynamic case base maintenance process is 
that the CBR system improves the case base using its problem solving process. 
Moreover, the case base improves or degrades the coverage of a case depending 
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Algorithm 1 Dynamic Case Base Maintenance (DCBM) Model 

DCBM (CaseMemory T) 

1. Initialize Coverage(ti ) using a CBM metric in acquisition stage, for all t, G T 

2. Tfc_|_ i Reduce the initial case base Tfc using Coverage 

3. Repeat until problem solving process of the CBR cycle is not finished 

4. Tfc Tfc+i 

5. Retrieval phase selects from Tfc the ANN used to solve the new problem 

6. Reuse phase selects the best INN to solve the new problem 

7. Revise phase <— computes the rewards Rt of the K NN 

8. Retain phase <— computes : 

9. CoverageUpdate <— for each ti E K NN 

10. Apply case base maintenance policy function to decide the set of Actions A 

11. Tfc_|_ i Update case base Tfc based on the Actions A 



on their resolution accuracy. Thus, the case base can be categorized at different 
levels of coverage. The lower the coverage of a case, the most appropriate to 
disappear from the case base. 

2.3 Dynamic Case Base Maintenance Policy Functions 

The core of the RL process is the case base maintenance policy function. We 
describe two different policies to test the reliability of the proposed Dynamic 
Case Base Maintenance (DCBM) model. 

RLOLevel. This policy is called Reinforcement Learning Oblivion policy by 
Level of Coverage (RLOLevel). This policy uses a similar philosophy that our 
acquisition [Salamo and Golobardes, 2003] case base maintenance method called 
ACCM. If we start from the premise that ACCM works well to reduce the case 
base while maintaining the prediction accuracy, it leads us to believe that the 
same process will be useful for dynamic maintenance. Thus, the complete process 
is detailed in algorithm 2. 



Algorithm 2 RLOLevel 




1. 


SelectCasesRLOLevel (CaseMemory T) 




2. 


conf idenceLevel = 1.0 and freeLevel = ConstantTuned 


(set at 0.01) 


3. 


select all instances ti E T as SelectCase{ti) if ti 
cover age(t) > conf idenceLevel 


satisfies : 


4. 


while not 3 at least a ti in SelectCase for each class c that class{ti ) = c 


5. 


conf idenceLevel = conf idenceLevel - freeLevel 




6. 


select all instances ti E T as SelectCase(ti) if t 
cover age{ti) > conf idenceLevel 


i satisfies: 


7. 


end while 




8. 


Action A is to delete from CaseMemory T the set of 


cases NOT selected as SelectCase 


9. 


return Action A 





The algorithm 2 tries to remove as much cases as possible. Therefore, the 
selection process is repeated until it accomplishes that every distribution class 
contains at least one case selected. Thus removing from case base those cases not 
selected. It is clear that this process will be very aggressive with the case base 
because it maintains the minimum description of the case base. It leads us to be- 
lieve that this policy function may not work properly in a dynamic environment. 
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RLOCE. This policy is called Reinforcement Learning Oblivion by Coverage 
and Error (RLOCE). The coverage is the relevance of a case. This policy shows 
the simplest way to decide the actions. 



Algorithm 3 RLOCE 

1 . SelectCasesRLOCE (CaseMemory T) 

2. for each instance t £ T 

3. if coverage(t) < initialCoverage(t ) then SelectCase(t ) end if 

4. Action A is to delete those cases selected 

5. return Action A 



The policy is based on coverage lost. A case will be deleted if it classifies 
incorrectly new problems more often than correctly. Thus, the cases that produce 
misconception are deleted. 

3 Description of the Experimental Analysis 

This section is structured as follows: first of all, it is important to understand 
the fundamentals of our metric to initialize the coverage of a case. Then, we de- 
scribe the testbed used and its characteristics. Finally, we analyze with different 
experiments the dynamic case base maintenance model. 

3.1 Fundamentals 

The rough sets theory defined by Pawlak, which is well detailed in [Pawlak, 1982], 
is one of the techniques for the identification and recognition of common pat- 
terns in data, especially in the case of uncertain and incomplete data. The math- 
ematical foundations of this method are based on the set approximation of the 
classification space. 

Each case is classified using the elementary set of features which can not be 
split up any further, although other elementary sets of features may exist. In 
the rough set model the classification knowledge (the model of the knowledge) 
is represented by an equivalence relation IND defined on a certain universe 
of cases U and relations (attributes) R. The pair of the universe cases U and 
the associated equivalence relation IND forms an approximation space. The 
approximation space gives an approximate description of any subset X of U. 
Two approximations are generated by the available data about the elements of 
the set X, called the lower and upper approximations. The lower approximation 
RX is the set of all elements of U which can cert-ainly be classified as elements 
of X in knowledge R. The upper approximation RX is the set of elements of U 
which can possibly be classified as elements of X, employing knowledge R. In 
order to discover patterns of knowledge we should look for the minimal set of 
attributes that discerns cases and classes from each other, such a combination 
is called a reduct. 

Measure of Relevance Based on Rough Sets. The reduced space, composed 
by the set of reducts ( P ) is used as a metric to extract the relevance of each case. 
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Definition 3 (Coverage Based on Rough Sets). 

This metric uses the quality of classification coefficient, computed as: 



V ti £ T it computes : Coverage{ti) 



card ( Pjtj)) U card (-P(T)) 
card ( all instances ) 



Where Coverage(U) will be the coverage of case ty, T is the training set; card 
is the cardinality of a set; P is a set that contains the reducts; and finally PfU) and 
P(ti) is the presence of t, in the lower and upper approximation respectively. 



The Coverage coefficient expresses the percentage of cases which can be 
correctly classified employing the knowledge t. This coefficient has a range of 
real values in the interval [0.0, 1.0]. Where 0.0 and 1.0 mean that the case is 
internal and outlier respectively. 

We will use the Coverage as initialC over age in our DCBM model. We also 
use the Coverage in our reduction technique (ACCM) in acquisition stage. Our 
experiments analyze the behaviour of DCBM model in front of ACCM. Our 
RLOLevel policy function is based on this algorithm. We apply ACCM in the 
training case base to select a range of cases that have to be deleted from the 
case base [Salamo and Golobardes, 2003]. ACCM maintains all the cases that 
are outliers, so cases with a Coverage = 1.0 value, and those cases that are 
completely internal, so cases with a Coverage near 0.0. Thus, reducing from the 
case base those cases that are not outlier and have a coverage near 1.0. 

Using coverage values, we have two kind of cases relevant in the case base: 
the ones with coverage value of 1.0 (outliers) and the internal cases, having low 
coverage value. This coverage distribution is not much suitable for the RL policy 
functions which rely on high coverage values. Thus, we modify, previously to 
update phase and independently if we have applied ACCM or not, the coverage 
value with this formula: Coverage(t) = 1 — Coverage(t), with the exception 
of outlier cases that have a Coverageft) = 1.0. Therefore, we obtain coverage 
values that show relevance according to RL policy functions. 



3.2 Testbed 

The evaluation performance of the approaches presented in this paper is done 
using different datasets which are detailed in table 1. Datasets can be grouped in: 
public [Merz and Murphy, 1998] and private [Golobardes et al., 2002] that comes 
from our own repository. These datasets were chosen in order to provide a wide 
variety of sizes, combinations of feature types, and difficulty because some of 
them contain a great percentage of inconsistencies. 

The percentage of correct classifications and the percentage of case base main- 
tained has been averaged over stratified ten-fold cross-validation runs. To 
study the performance we use paired t-test on these runs. 

The study described in this paper was carried out in the context of our CBR 
system: BAST1AN (case-BAsed SysTem for classIficAtioN) . All techniques 
were run using the same set of parameters for all datasets: The case base is a list 
of cases. Each case contains the set of attributes, the class, the Coverage and 
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Table 1 . Details of the datasets used in the experimental analysis 





Dataset 


Ref. 


Samples 


Num. feat. 


Sym. feat. 


Classes 


%Inconsistent 


1 


Balance scale 


BL 


625 


4 


3 


2 


2.0 


2 


Breast cancer Wisconsin 


BC 


699 


9 


- 


2 


0.30 


3 


Credit- A 


CA 


690 


5 


9 


2 


9.71 


4 


Heart- H 


HH 


294 


6 


7 


5 


20.4 


5 


Heart- Statlog 


HS 


270 


13 


- 


2 


0.0 


6 


Hepatitis 


HP 


155 


6 


13 


2 


0.0 


7 


Horse-Colic 


HC 


368 


7 


15 


2 


5.67 


8 


Ionosphere 


IO 


351 


34 


- 


2 


0.0 


9 


Iris 


IR 


150 


4 


- 


3 


0.0 


10 


Labor 


LB 


57 


8 


8 


2 


0.0 


11 


Mammogram ( private ) 


MA 


216 


23 


- 


2 


5.00 


12 


Soybean 


SY 


683 


- 


35 


19 


10.08 


13 


TAO-Grid ( private ) 


TG 


1888 


2 


- 


2 


0.0 


14 


Vehicle 


VE 


846 


18 


- 


4 


0.0 


15 


Vote 


VT 


435 


- 


16 


2 


4.13 



the initialC overage. Furthermore, the retrieval phase extracts the AT-Nearest 
Neighbor to be updated in the RL process, not for the reuse phase which uses 
a 1-Nearest Neighbor. We do not learn new cases during problem solving 
stage. 

4 Analysing the DCBM Policy Functions 

First of all, we test our DCBM policy functions using all the training set in front 
of INN algorithm and our reduction algorithm (ACCM) in acquisition stage (see 
columns 2 to 9 in table 2). We introduce in this experiment ACCM algorithm 
in order to compare the case base reduction (size) with our DCBM policies. 

We observe that the best prediction accuracy is often obtained using oblivion 
by level of coverage (OL) and oblivion by coverage and error (OCE). Looking 
at ACCM algorithm, it has greater reduction than INN. In spite of the fact the 
reduction of DCBM policies is not as great as ACCM, because its selection to 
delete is founded on the K NN selected, they produce a good balance between re- 
duction and improvement of prediction accuracy. That is, they are less aggressive 
reducing the case base than ACCM. 

There is a clear conclusion: if we prefer to reduce the case base while maintain- 
ing the prediction accuracy of the system, it is better to use DCBM model than 
ACCM applied only during acquisition. Once analysed DCBM model alone, we 
test the combination between acquisition (ACCM) and learning (DCBM) stages 
at the same time. 

Table 2 shows (from column 10 to 15) the results of such combination. In this 
case, the ACCM final case base will be the initial one for the DCBM policies. 
Before examining this question in detail, let us notice that there are two results to 
highlight: the percentage of cases maintained by our DCBM policies and the final 
case base size when finishing both processes. The percentage of cases maintained 
during oblivion (obliv) is computed using this formula ^f^aicasesACCM x 
which shows the behavior of the DCBM policies. The percentage of final case 
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base size (size) shows the percentage of case base maintained from the original 
training set, it is computed using this formula #f malcases x 100. 

07 1 ° jfctramcases 

Table 2. Results for all methods using an update parameter KNN = 5 Avl shows the 
mean value for all datasets. We use paired t-test at the level of 5% significance, where 
a • and a o stand for a significant improvement or degradation of DCBM policies and 
ACCM to INN 



Ref 


cbr 

INN 


size 


cbm 

ACCM 


size 


cbr 

OL size 


cbr 

OCE size 


cbm 

OL obliv 


size 


OCE 


cbm 

obliv 


size 


BL 


76.15 


100.0 


77.27 


• 97.44 


78.73 »88. 69 


78.73 *88.69 


78.11 85.89 


83.69 


79.04 .90.25 


87.94 


BC 


95.86 


100.0 


95.43 


77.36 


95.99 67.93 


95.99 97.61 


96.26 38.79 


30.01 


95.98 


96.67 


74.78 


HC 


73.36 


100.0 


70.91 


o86.14 


81.24*88.79 


81.24*88.79 


81. 24» 81.49 


70.19 


80.16 *88.53 


76.26 


CA 


81.76 


100.0 


82.19 


84.30 


82.63 89.40 


82.63 89.40 


82.47 86.80 


73.17 


82.47 


87.35 


73.63 


MA 


63.93 


100.0 


64.53 


89.19 


62.11 51.44 


63.39 77.77 


55.88 0 7.55 


6.73 


64.91 


78.95 


70.42 


TG 


96.13 


100.0 


96.13 


95.87 


96.66 97.44 


96.66 97.44 


63.92 o 0.23 


0.22 


96.60 


97.72 


93.69 


HH 


72.82 


100.0 


72.12 


85.63 


75.56 *87.86 


75.56 *87.86 


75.19 14.38 


12.32 


76.23*88.12 


75.47 


HS 


74.07 


100.0 


75.55 


79.67 


74.81 86.74 


74.81 86.74 


75.18 29.28 


23.33 


77.03 


85.27 


67.94 


HP 


77.99 


100.0 


77.33 


87.67 


78.58 87.67 


78.58 87.67 


74.75 47.83 41.93 


77.87 


86.09 


75.48 


IO 


86.92 


100.0 


87.20 


83.79 


87.74 91.45 


87.74 91.45 


88.01 54.25 


45.45 


87.74 


91.16 


76.38 


IR 


95.33 


100.0 


96.66 


89.03 


95.33 97.03 


95.33 97.03 


91.33 o 8.56 


7.63 


96.00 


97.60 


86.96 


LB 


83.38 


100.0 


83.04 


77.38 


87.04 88.50 


87.04 87.91 


81.14 52.14 40.35 


86.47 


86.65 


67.05 


SY 


82.15 


100.0 


83.83 


• 78.38 


87.15 *92.09 


87.28»91.65 


86.22 *87.96 


68.94 


86.22 *88.58 


69.43 


VE 


69.43 


100.0 


68.13 


72.36 


69.53 80.33 


69.53 80.33 


68.36 67.40 48.77 


68.38 


72.23 


52.27 


VT 


86.65 


100.0 


90.78 


• 79.23 


92.60*95.47 


92.60 *95.47 


91.96 *40.39 


32.00 


92.86*95.03 


75.30 


Av 


81.06 


100.0 


81.40 


84.22 


83.04 86.05 


83.14 89.72 


79.33 46.82 


38.98 


83.19 


88.68 


74.86 



We concentrate on different observations in table 2 that allow us to express 
points in favour of the DCBM model. 

— Reduction of the case base during the acquisition stage is not enough. As 
the results show in ACCM column and we have also noticed previously, it is 
necessary to delete “harmful” knowledge during the problem solving process. 

— The DCBM using the problem solving process helps the system to obtain a 
more accurate and reduced case base. 

— The reduction obtained using DCBM augment the prediction accuracy of 
standard INN algorithm, with the exception of the combination between 
ACCM and OL. The combination does not work because it is too much 
aggressive with the case base, as expected previously when defined. 

— On the other hand, OL works properly if it is not combined with ACCM, 
even though it has a great reduction policy to select the cases for being 
removed from the case base. In conclusion, OL can be only applied alone. 

— The combination of ACCM with OCE does not improve often the perfor- 
mance of OCE applied alone. However, the combination has a higher re- 
duction than OCE alone and also improves on average previous prediction 
accuracy. 



5 Conclusions 

This paper proposes a model for case base maintenance that uses the dynamics of 
the problem solving process to search for the optimal case base while maintaining 
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the prediction accuracy. The experimental study demonstrates that the DCBM 
model using different policies manage to get the initial objectives: it optimizes 
the case base while it improves on average the prediction accuracy of the system. 
Our further work will be focused on testing the model in recommender systems 
in order to analyze a dynamic environment with our dynamic model. We also 
think of testing different case reduction methods on acquisition stage. 
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Abstract. This paper describes a Case Base Seeding system (CBS) that can be 
used to seed a case base with some random cases in order to provide minimal 
conditions for the empirical tests of a Case-Based Planning System (CBP). 
Random case bases are necessary to guarantee that the results of the tests are 
not manipulated. Although these kind of case bases are important, there are no 
references about CBS systems in the literature even from those CBP systems 
that claim to use some similar systems. Therefore, this paper tries to overcome 
this deficiency by modeling and implementing a complete random Case Base 
Seeding process. 



1 Introduction 

There are many case-based planning (CBP) systems in the literature [4], [5], [7], [12], 
[13]. A CBP system is known as a planner that retrieves potential cases from a case 
base that can become solutions to problems. Any case-based planner needs a case 
base with suitable cases that allows the system to work properly and to have some 
advantages over generative planning systems. 

Differently of the CBP literature, Case Base Seeding systems are rare in the 
literature, although they are important for CBP system. In fact, most CBP systems 
seed their case bases in order to perform their tests. However, these seeding processes 
are either composed by hand or they are specific for some domains. In addition, there 
are no available explanations about them in the literature. 

This lack of a generic and random seeding process in the literature can become an 
obstacle to CBP systems researches to perform their empirical tests and, 
consequently, to analyze their systems performance suitably. 

In this paper, a Case Base Seeding system is presented by focusing on how to 
create random and suitable states in planning domains. The seeding system creates a 
random problem by creating consistent initial and goal states. This random problem is 
then solved by a generative planner, called FF [3], which produces a solution (a plan), 
and consequently, a new case for the case base. 

This paper is a detailed extension of the CBS System used in [12] and it is 
structured as follow: in the section 2, some case bases used by some CBP systems are 
described. In section 3, the main part of the CBS system is detailed followed by the 
section 4 that describes the complete CBS system. Section 5 discusses some issues of 
CBS system and, finally, section 6 concludes this paper. 
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2 Case Bases in CBP Systems 

A Case-Based Planning system is a planner that uses past experience, stored in cases, 
to solve the problems. Cases in a CBP system are previous plans or traces of planning 
solutions. In order to be an efficient system, a Case-Based Planner must use a case 
base with a large variety of cases. This is because the performance of a CBP system 
depends on the cases in the case base since the adaptation of a similar case can be 
computational expensive in general. Therefore, to retrieve a very similar case or even 
a complete-solution case improves the performance of the CBP system by decreasing 
the time spent by the adaptation phase. Obviously, anyone can design and create, by 
hand, a case base with many very similar cases to some specific problems in order to 
increase the performance of the CBP system. 

However, to perform any empirical test in CBP systems, a case base with random 
cases must be available. This random case base would avoid any kind of manipulation 
that permits a system to work with many very similar cases and that consequently 
turns the tests not completely valid. The problem is that there is no schema or a 
formal process to create case bases with random cases available in the literature. 

Some CBP systems use pre-defined or given examples to fill their case bases up or 
to perform their tests, e.g., DerSNLP+EBL [4], Others systems like CAPER [5] and 
Caplan/CBC [7] use automatically creation of cases, but they need either a start pivot 
(Caplan/CBC) or were designed for some specific domains (CAPER - UM Translog 
Domain). 

The Prodigy/Analogy [13] is another CBP system that uses an automatic creation 
process of cases. This process creates random problems and, consequently, random 
cases. However, it is specific for some domains, like logistic transportation, one-way- 
rocket and machine-shop domains, and Veloso [13] does not let available any detail 
of how this seeding system really works. 

To summarize, the cases creation processes used by CBP systems are either 
domain specific or no specification of how these processes work can be found. In 
other words, there is no generic process that researches can use to create random cases 
for CBP systems. 

This paper intends to overcome this deficiency by defining a Case Base Seeding 
system that can be used for any Case-Based Planner to create their own random case 
base. It is a detailed and extended version of the seeding system presented and used in 
the Far-Off system [12], 



3 Creating Random Consistent States 

A Case Base Seeding system can be designed by using a generative planning system 
to produce sound plan-solutions and, consequently, suitable cases. However, this 
generative planner just performs its task if an initial state and a goal state are 
available. In fact, the most important piece of a CBS system is a process that creates 
correct and consistent states in the application’s domain. A state in a planning domain 
is a set of instantiated predicates [12]. 

In order to generate random and consistent states in a certain domain, some 
additional information must be added by the user. Usually, the available information 
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for a planning system is only the domain and the problem features both in PDDL 
language [6], The domain features describe the actions and the predicates, besides the 
types of the elements that the predicates can handle. The problem features describe 
the initial state, the Goal State and the elements for each type in the domain. 

Each predicate can handle some specific types of elements, which are the possible 
values for the predicate’s free variables. A predicate can become a fact when its all 
free variables are instantiated, i.e., a fact is a grounded predicate. 

Definition 1 (A Fact). A fact is a grounded (instantiated) predicate. 

Definition 2 (The Predicate of a Fact). The predicate of a fact is the not-grounded 
format of the fact. 

However, these information about domain, predicates, actions and problem are not 
enough for a CBS system, i.e., it is not possible to create consistent states with this 
information only. 

Therefore, additional information, containing the semantic of each predicate and 
their relations with others predicates, is necessary. For example, in the Blocks World 
domain, this additional information must define that holding and handempty 
predicates cannot appear together in a state. In fact, this additional knowledge is 
difficult to be obtained from the information described in PDDL language. 

This additional information, from now on simply denominated domain semantic, 
encapsulates semantic features of the domain in terms of predicates relations that can 
not be extracted from actions, but only from consistent states features. 

To describe the relations among predicates, some definitions are stated. These 
definitions must encapsulate negative and positive interactions: 

Definition 3 (Positive Existence). The Positive Existence of a predicate p is the set of 
predicates and facts that must be in the state where p is true. 

Definition 4 (Negative Existence). The Negative Existence of a predicate p is the set 
of predicates and facts that can not be in the state where p is true. 

Definition 5 (Positive Absence). The Positive Absence of a predicate p is the set of 
predicates and facts that must be in the state where p is not true. 

Definition 6 (Negative Absence). The negative absence of a predicate p is a set of 
predicates and facts that can not be in a state where p is not true. 

In order to clarify the use and the importance of the definitions above, the Blocks 
World domain with 4 blocks ( A,B,C and D) can be considered. In this domain, the 
facts on(A,C), holding) A) and clear(B) are the predicates that are in the negative 
existence of the fact on(A,B). It means that the facts on(A,C), holding(A) and clear(B) 
can not be together, with the fact on(A,B) in the same state. In fact, not only on(A,C) 
can not exists in the same state of on(A,B), but also any predicate of the form on(A,x). 

The seeding system permits a generic specification of negative and positive 
existence and absence. For example, for a on(x,y) predicate, the following predicates 
can not be in the same state: holding(x), clear(y) and on(x,_). The others definitions 
encapsulate the complementary three possibilities of Existence and Absence relations 




A Case Base Seeding for Case-Based Planning Systems 



107 



of a specific predicate. There are some facts, however, that must be in all states of the 
domain. They are called fixed facts: 

Definition 7 (Fixed Facts). The Fixed Facts is a set of facts that must be true in all 
states of the domain. 

In a typed domain, some predicates restrict their variables to some values. An 
example can be illustrated in Logistic domain. The predicate at(x,y) means that x is at 
y, where x is a package or a vehicle and y must be a location like city or airport. 
However, the semantic of this predicate in a domain is restricted to: 
at(airplane, airport) and at(package,y), i.e, when the x variable is instantiated with an 
airplane, the location denoted by y must necessarily be an airport. On the other hand, 
the type package does not require any kind of restriction for the y variable. This 
information is not explicit in the domain’s definition. To describe such feature the 
following definition is stated: 

Definition 8 (Restricted Facts). The Restricted Facts is a set of facts of a specific 
predicate that can really exist in a certain domain. 

The purpose of the above definition is to restrict the facts that can be created 
between the predicate definition and all elements encapsulated in each type. These 
restricted facts must be provided by the user because there is no information about it 
in problem or domain features. They can also be automatically extracted from the 
domain specification through some state invariant extractor, like TIM [1]. 

Some predicates are even more restricted. They can appear in a state one time at 
most and others can appear one time at least. This feature is then defined: 

Definition 9 (At-Least-One Predicate). The At-Least-One predicate is that predicate 
which facts can be true in a state at least one time. 

Definition 10 (At-Most-One Predicate). The At-Most-One predicate is that 
predicate which a fact can be true in a state at most one time. 

With all above definitions, a process to create a consistent state is defined in 
Figure. 1. As stated before, a process to create a consistent state is necessary to define 
a process that creates a case base. In order to produce a consistent state, a set of all 
facts of the domain is necessary. 




Fig. 1 . The diagram of the consistent state creation process 
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The process to create this set, called Available and not Allocate predicates set 
(AnA), follows an initial seed example and the restrictions configured by the user. 

This initial seed is an example of a problem, including an initial state and all the 
elements of each type of the domain. This information can also be given by the user, 
like the Prodigy/ Analogy system [13]. 

The most important contribution of the seed example is the determination of the 
values that can instantiate a fixed predicate. Since the fixed facts are defined as those 
predicates that must be in all states, they will also be in the initial state of the seed 
example. Therefore, the possible facts of each fixed predicate can be easily extracted 
from the seed example. 

Besides the number and specification of the elements and the determination of 
fixed facts, the seed example is also necessary to provide important information about 
the consistency between the initial state and the goal state. Some predicates accept 
only some values for their variables, and these values are defined in an example of a 
possible initial state. For example, in the logistic domain, the predicate at(truck,city) 
can not be instantiated with any truck and any city, because some trucks are specific 
of some cities and they can not travel to other cities. So, the CBS system must 
consider this information. 

In order to complete the restriction specified in definition 8, an specific and special 
variable must be used to extract information from the initial state of the seed problem 
example. This variable, called InSt, restricts the values of an specific variable to those 
presented in the seed example. 

Observe that the seed example avoids the user to specify all the restricted, fixed 
facts, and all elements of the domain by hand. Therefore, the process to create the 
AnA set follows the rules specified below: 

♦ Extract from the seed example all elements of each type 

♦ For each predicate, all possible combination of elements in their variables are 

generated considering: 

♦ Each element must instantiate one and only one variable of the predicate, 
eliminating any fact that has repeated element in their variables. 

♦ If the predicate is a fixed fact, then consider only the facts that exist in the 
seed example. 

♦ If an element will instantiate a variable labeled as InSt, then consider only 
the facts that match any other fact in the seed example. 

During the AnA set creation, all facts that are grounded At-Least-One predicates 
compose a specific sub-set and all facts that configure as Fixed Fact compose another 
specific sub-set. 

The second step is to choose one fact from AnA to compose the in-creation state. 
This choosing process follows a heuristic that chooses the most restricted predicates 
first. These restricted predicates are those classified as Fixed and At-Least-One 
predicates. This is because both predicates must be in all states and they can probably 
prune other facts. 

The process chooses the Fixed Facts first and then selects At-Least-One predicates 
randomly. Each selection fires the Consistency Mechanism that applies one of their 
rules. When all Fixed and any different At-Least-One predicates from the chosen ones 
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are not available in AnA set, the process starts the generic random process that can 
choose any fact present in AnA. 

The CBS system uses a set of consistency rules that guarantee the creation of a 
consistent state ( in-creation State). The consistency rules are allocated in two 
Consistency Mechanisms: Insertion and Deletion Consistency Mechanisms. 

Definition 11 (Rules of the Consistency Mechanism). 

In Insertion Consistency Mechanism, there are three rules: For each insertion of a 
fact f in in-creation State, where f is a grounded predicate of the general predicate p: 

♦ Consistency Rule 1 (ICM-R1): 

All facts that fit in Positive Existence predicates of p are included in the in- 
creation State and deleted from AnA set. 

♦ Consistency Rule 2 (ICM-R2): 

Allfacts that fit in Negative Existence predicates of p are deleted from AnA set. 

♦ Consistency Rule 3 (ICM-R3): 

If the predicate p is configured as The-Most-One predicate, all facts of predicate 
p are deleted from AnA 

In the same way, the Deletion Consistency Mechanism has two rules: For each 
fact f deleted from AnA, where f is a grounded predicate of the general predicate p: 

♦ Consistency Rule 1 (DCM-R1 ): 

Each fact that fits in the Negative Absence of p must be also deleted from AnA. 

♦ Consistency Rule 2 (DCM-R2): 

Each fact that fits in the Positive Absence of p must be inserted in in-creation 
State. and deleted from AnA. 

Therefore, the CBS system performs a repetition of insertion and the application of 
the Consistency rules. When the AnA becomes empty, the process stops and the in- 
creation State becomes a potential consistent state following the user configuration. 

During the application of the consistent rules, it is possible to detect that fail 
occurred and consistent state can not be created. For example, the rule DCM-R2 can 
detect there is no predicate that it must insert. Therefore, if any rule fails to 
accomplish their tasks no consistent state is guaranteed to be created. 

The process of deletion and insertion performed by the consistent rules are fired 
when a fact is deleted from AnA or inserted in the in-creation State. Since this 
process is not cyclic, it can not falls in a infinite and recursive looping caused by a 
wrong configuration, which would probably accuse fail. 



4 Creating a Random Case Base 

With a random state creation process defined, the entire process of a case base 
creation can be designed. As mentioned before, a case is made by a plan generated by 
a planner. This planner will be the generative planner called Fast-Forward (FF) [3] 
that is a heuristic search planner. The FF system uses a heuristic (FF-heuristic) to 
perform its planning task: to find a plan from a given initial state to another state 
where the goal is satisfied. 
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4.1 Creating a Random Initial and Goal States 

The process to create random and consistent states, described before, is used to create 
the initial and the goal states. Although the initial state is a complete and consistent 
state that is straightforward released by the state creation process, the goal state is not. 
In fact, the difference between initial state and the Goal State is that the latter can not 
be a complete state. In theory, a goal state may be a complete state, however, as 
verified in many available planning problems for different domains, only some 
predicates are allowed to constitute a Goal State. For example, in the Blocks World 
domain, only the predicate on(x,y) is relevant to the goal. The same observation can 
be made for at(x,y) predicate in Logistic domain and the served(x) in Miconic domain. 

Therefore, to let the Goal State similar to those presented in the problems, it is 
defined the relevant predicates: 

Definition 12 (Relevant Predicates). The relevant Predicates are those predicates 
that can be part of a goal state. 

The relevant predicates of a domain must be configured by the user or extract from 
a seed example. The goal state has another restriction: the number of the relevant 
predicates is also random. For example, in the Blocks World domain with 10 blocks, a 
goal can be only a composition of two on(X,Y) predicates, like on(A,B) and on(B,C), 
and not a composition of all possible and consistent combination of on(x,y) predicate. 

Therefore, the process to create a goal state can be more than the straightforward 
use of a complete random state. It requires two filters: The Relevant Predicates Filter 
and the Random Number of Facts Filter. These filters are applied in a random and 
consistent state, called first-stage goal state, created by the process described above. 

Both filters prune the predicates in the Goal State. The Relevant Predicate Filter 
will delete any fact from the first-stage goal state that is not a relevant predicate, 
creating the second-stage goal state with only relevant predicates. 

The second filter, the Random Number of Facts Filter, is now applied. Its first step 
is to choose randomly the number of facts, denominated Gn, which will be left in the 
state. This number has the range from 1 to the number of facts in the second stage 
state. The second step is to choose randomly Gn facts of the second stage, creating the 
third (the final stage) of the goal state. 




Fig. 2. The complete model of the Case Base Seeding Process 






A Case Base Seeding for Case-Based Planning Systems 



111 



The final stage of the goal state is then created by Gn relevant facts from a 
consistent and complete state resulted by the creation process. 

4.2 Producing a Plan 

With an initial and goal states established, the process to produce a plan is about to 
start. However, a last verification must be necessary. Until now, consistent initial and 
goal states are created, but nothing can guarantee that both states are consistent with 
each other. As a last verification, the FF-Heuristic is used to estimate a number of 
action between the initial and goal states. The FF-heuristic is very useful for many 
case base planning purposes [10], because it is a good estimation of the possible 
number of actions between a state and the goal. 

In order to guarantee the consistency between the initial and goal states, the FF- 
heuristic result, applied for both states, must be a finite number. If the estimation 
result is infinite, then there is no possible solution between those states. In this case, 
the process restarts and others consistent states are generated as initial and goal states. 
Otherwise, if the result estimation is a finite number, then the FF planner is fired to 
produce a plan between initial and goal states. This plan is transformed to a new case. 

A generative planner is necessary because cases are composed by plans or traces of 
them. Given a sound plan as a solution for a problem, a case can be created. The 
Figure 2 summarizes the case base creation process in a more general view. 

4.3 Maintenance Process for Case Bases 

As a post-seeding process, a maintenance policy must be applied in the case base in 
order to let it representative and not redundant. For that, any maintenance policy 
found in the literature, like min-injury [ 1 1], Type-based [9] and case-addition policy 




Predicates Relation 
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^ . Positive Existence 

^ Negative Existence 
^ Negative Absence 
-► Positive Absence 



Fig. 3. Example of the at(obj,Loc) predicate configuration in the CBS 
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[14] can be applied to the case base in order to improve its competence. The 
competence of a case is the range of problems that it can solve [8]. 

The maintenance policies will reduce the redundant cases and others cases that do 
not contribute to increase the competence, but only to create 'hot spots’ - spots of 
concentration of cases. After the use of any maintenance process, a short case base 
will be created, keeping its original competence and an uniform distribution of cases. 

If the short case base becomes smaller than necessary, the seeding process can be 
called to produce more cases to fill the case base up again. This will create a cycle of 
seeding and maintenance process that will increase the quality and the 
representativeness of the case base. 



5 Empirical Tests and Discussion 

The Case Base Seeding process presented in this paper was implemented and used in 
the Far-Off system [12]. The figure 3 shows the configuration window of the CBS 
seeding used by the Far-Off system. 

In the Far-Off system empirical tests, this CBS was used to create case bases over 
more than 3000 cases for each problem and domain. The creation of those case bases 
allowed diversified tests in many STRIPS domains, as showed in [12]. It is important 
to highlight that the CBS described in this paper concerns about STRIPS-model 
domains. Therefore, the CBS system is not suitable to describe the semantics of all 
existed planning domains. This is because the definitions in this paper were extracted 
only from STRIPS domains, and most of them were in planning competition. 

The CBS system can be improved to work with more complex domains that handle 
numerical parameters and resources. In addition, it can be also improved by using a 
learning system that can learn and extract information from states, problems and 
domain defined in PDDL language. An algorithm, called DISCOPLAN [2] discovers 
state constraints (invariants) of a domain automatically. It uses this information to 
improve the planning efficiency. It can discover, for example, some information 
about type constraint, like those defined in definition 8, or also extract some relation 
as those defined from definition 3 to definition 6. In the future, the CBS system can 
use DICOPLAN or TIM [1 ] to extract information automatically. 



6 Conclusion 

This paper describes a Case Base Seeding system (CBS) that can be used to construct 
a case base with random cases. It is suitable for empirical tests of Case-Based 
Planning Systems (CBP). 

Seeding systems are not easily found in the literature, although many CBP systems 
use some random case bases in their tests. This paper just describes a complete and 
generic CBS system that can be applied in STRIPS-model domains. 

The CBS system can be improved in the future, by incorporating more complex 
features and by using some automatic processes to extract states constraints and 
invariant automatically. 
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Abstract. Nowadays, one of the main techniques used in heuristic plan- 
ning is the generation of a relaxed planning graph, based on a Graph- 
plan - like expansion. Planners like FF or MIPS use this type of graphs 
in order to compute distance-based heuristics during the planning pro- 
cess. This paper presents a new approach to extend the functionality 
of these graphs in order to manage numeric optimization criteria (prob- 
lem metric), instead of only plan length optimization. This extension 
leads to more informed relaxed plans, without increasing significantly 
the computational cost. Planners that use the relaxed plans for further 
refinements can take advantage of this additional information to compute 
better quality plans. 



1 Introduction 

The problem of domain-independent planning is a very complex problem. In fact, 
the problem is PSP ACE-complete even if some severe restrictions are applied 
[2]. Nowadays, one of the best techniques to deal with these complex problems 
is heuristic planning. Planners like FF [8], LPG [6], VHPOP [12] or MIPS 
[4], use different heuristic functions to guide the search, and all of them have 
demonstrated a very competitive performance. 

The general principle for deriving heuristics is to formulate a simplified (or 
relaxed) version of the problem. Solving the relaxed problem is, in general, easier 
than solving the original problem. The solution of the relaxed problem is used 
as heuristic to estimate the distance to the goal. One of the most common 
relaxations is to ignore the negative effects (or delete list in STRIPS notation) 
of the actions. This idea was first proposed by McDermott [9] and, since then, 
a big number of planners have adopted this technique (i.e. , HSP [1], CRT [10], 
Sapa [3], FF, etc.) 

Nowadays, the planning community is working on extending the function- 
ality of their planners to deal with more expressive problems. Many real-world 
problems have several characteristics that can hardly be expressed in pure propo- 
sitional STRIPS: complex action conditions, numeric functions, durative actions, 
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uncertainty, etc. Most of these features can be modeled with the PDDL 2.1 [5] 
language, but planners must be adapted to support the new extensions. One 
of the main contributions of PDDL 2.1 is the possibility of specifying an opti- 
mization criterion for a planning problem. This criterion, called problem metric, 
consists of a numeric expression, which has to be maximized or minimized. There- 
fore, the user can ask the planner to optimize, for example, the fuel consumption 
in a transportation problem, rather than the commonly used criteria as the plan 
length or plan duration. 

In this paper we propose a new technique to extend the relaxed planning 
graphs. This technique allows to deal with numeric optimization criteria. In spite 
of there are heuristic planners that can handle problem metrics (like Metric-FF 
[7] or MIPS [4]), their heuristic functions are still based on the plan length of the 
relaxed problem solution. This paper shows how this technique can be applied 
to improve the quality of the heuristic functions and, consequently, the quality 
of the final plans. 

2 The Relaxed Planning Graph 

The relaxed planning graph ( RPG ) is a graph, based on a Graphplan- like expan- 
sion where delete effects are ignored. In this section we describe the traditional 
generation of a RPG , although taking into account the numeric part of PDDL 
2.1. Firstly, we have to formalize some concepts. A (numeric) planning problem 
is defined as a tuple (F, P , A, I, G, M), where: 

— F is a set of numeric variables, called fluents. 

— P is a set of logical propositions. 

— A is the set of actions. 

— I is the initial state. 

— G is a set of goals. 

— M is the problem metric (numeric optimization criterion). 

The numeric extensions of PDDL 2.1 imply that, in addition to the proposi- 
tions P, we have a set F of numeric variables (fluents) . A state s is thus defined 
as a set of propositions p(s) and a set of rational numbers v(s ) that represent 
the values for each fluent in that state: 

S= ( p(s),v(s))/p(s ) C P Av(s) = {vi,...,v n } : value{f il s) = Vi,\/fi G P 

An expression is an arithmetic expression over F and the rational numbers, 
using the operators +, -, * and /. The value of an expression exp in a state s is 
represented as value(exp, s). A numeric constraint is a triple (exp, comp, exp'} 
where exp and exp' are expressions, and comp € {<,<,=,>,>} is a com- 
parator. A numeric effect is a triple (ff, ass, exp) where ff G F is a fluent, 
ass G {:=,+ =,— =,* =,/ =} is an assignment operator, and exp is an 
expression. The outcome of applying a numeric effect in a state s, written 
nresult((fi,ass,exp) , s), is another state in which the value of fluent ff has 
been modified with the value of exp, using the assignment operator ass. 
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t = 0; Po = so 
while G P t do 

A t = {a € A/pprec(a) C P t } 
Pt+i = Pt U add(a), Va € A 

if P+i = Pt then fail endif 
t = t + l 
endwhile 



// Initialization 
// New expansion stage 
// Action level 
// Proposition level 



Fig. 1. Traditional PPG expansion 



Actions in this numeric framework can have numeric preconditions and ef- 
fects. Therefore, the preconditions of an action a £ A can be propositional, 
pprec(a) C P , or numeric constraints, nprec(a). Likewise, the effects of a can be 
propositional, add(a) and del(a) for positive and negative effects respectively, or 
numeric effects, neff(a). Regarding the problem goals, we restrict ourselves to 
propositional goals (G C P) for simplicity. This is the same simplification that 
the Sapa planner [3] does. However, the techniques presented in this paper can 
be easily translated to other approaches (like Metric-FF [7], which only ignores 
decreasing numeric effects). Finally, the problem metric (M) is an expression 
which value must be minimized. A maximization problem can be turned into 
a minimization problem just multiplying the metric expression by -1. Now, we 
can describe a relaxed planning problem, which ignores all delete effects of all 
actions. 

Definition 1. Assuming a planning problem P p = (F, P, A, I,G, M) , the relax- 
ation a + of an action a € A, a = (pprec(a),nprec(a), add(a), del(a),nef f(a)) , 
is defined as: 

a + = (pprec(a), 0, add(a), 0, 0) 

The relaxed planning problem is P+ = (F, P, A + , I, G, M) , where A + = 
{a+/a e A}. 

The first step to compute a solution for a relaxed planning problem is to build the 
RPG. The traditional RPG building algorithm generates proposition and action 
levels alternately. The first level (Po) is a proposition level which contains all the 
propositions that are true in the starting state (so) • Action levels contain all actions 
that are applicable in the previous level, and the following proposition levels are 
extended with the add effects of these actions. The expansion of the RPG finishes 
when a proposition level containing all top-level goals is reached, or when it is not 
possible to apply any new action. Figure 1 shows this process. 

3 Handling Optimization Criteria 

The basic idea to take into account the optimization criterion in the RPG stage 
is to include some information about the actions cost according to the problem 
metric. Nevertheless, considering all possible different situations which can arise 
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// Initialization 



cost(p ) = 



,Vp £ P 



progjprop = so 

0 , if p £ so 

oo , otherwise 
while 3 g £ G/cost(g) = oo do 

if progjprop = 0 then fail endif 
c = min(cost(p)) , Vp £ progjprop 
P c = {p/p £ progjprop A cost(p) = c} 
progjprop = progjprop — P c 
A c = {a £ A/cost(p) < c,Mp £ pprec(a)} 
for all a £ A c do 

costjreach.(a) = cost(p),Mp £ pprec(a) 
for all p £ add(a) A (costjreach(a) + cost(a) < cost(p)) do 
progjprop = progjprop U {p} 
cost(p) = costjreach(a) + cost(a) 
endf or 
endf or 
endwhile 



// New expansion stage 

// Level cost 
// Proposition level 

// Action level 
// Programming action effects 



Fig. 2. Proposed RPG for problem metric optimization 



due to modifications in the fluent values is unfeasible. Therefore, our proposal 
consists in evaluating (an estimate of) the cost of the actions in the current 
state (so). This simplification is completely acceptable in problems where the 
costs and the consumption of resources are not very dependent on the state in 
which the actions are applied. Most of the planning problems fulfill this require- 
ment. For example, the fuel consumption of a plane highly depends on static 
information like the flight distance. State-dependent information, like the num- 
ber of passengers in a particular flight, only affects slightly. 

The main difference with respect to the traditional RPG is that the levels of 
the graph do not represent time steps, but costs according to the problem metric 
(M). Thus, to compute the levels we have to estimate the cost of applying an 
action a: 

cost(a) = value(M , nresult(nef f (a ) , sq)) ~ value(M, s 0 ) + e , 

if cost(a) < £ then cost(a) = £ endif ' 

The cost of an action a is computed as the increase in the metric value caused 
by the application of a in s o (we apply the numeric effects of a to so, regardless 
of whether the preconditions of a hold in so or not) . The small e included in the 
cost represents that every action, even those that do not affect the metric value, 
have a cost. This way, if there is no metric defined in the problem, our RPG is 
equivalent to the traditional one. Finally, we check the cost to be positive. We 
ignore the action effects which decrease the metric value, since it will cause an 
out of order expansion of the graph. Then, these actions are considered to have 
the minimum cost. The algorithm for the RPG expansion can be formalized as 
figure 2 shows. 

The algorithm uses the list progjprop (programmed propositions) in order to 
store the propositions which will be inserted in the graph. Each programmed 
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Table 1. Domain description of the example problem 



operator param. 


pprec 


add 


del 


neff 


Load 

Unload 

Drive 


?c 

?c 

?cl ?c2 


at T ?c A at P ?c 
at T ?c A in P T 
at T ?cl 


in P T 
at P ?c 
at T ?c2 


at P ?c 
in P T 
at T ?cl 


0 

0 

driven-t-=distance ?cl ?c2 



proposition p has an associated cost cost(p). Initially, the prog -prop list only 
contains the propositions that hold in the current state sq. These propositions 
have no cost since they are currently true. The rest of the propositions are not 
achieved yet and, therefore, have an infinite cost. 

The graph expansion starts with the generation of the first propositional 
level. The propositional levels (P c ) are indexed through the cost of their propo- 
sitions, so all propositions in a level will have the same cost. The level cost (c) 
is computed as the minimum cost of the programmed propositions, in order to 
build the graph from lower to higher cost values. The respective action level ( A c ) 
contains the actions which preconditions have a cost c or lower. The positive ef- 
fects of these actions will be added to the prog^prop list only if they have not 
been achieved before with a lower cost. Let’s suppose that a is the action that 
produces a proposition p\ the cost of p ( cost(p )) is computed as the addition of: 

— The cost of achieving a (costjreach(a)): this cost is computed as the sum of 
the a preconditions costs. 

— The cost of applying a ( cost(a )), defined in (1). 

The RPG expansion finishes when all top-level goals are achieved, or when 
the prog-prop list becomes empty. If the prog_prop list is empty, then no new 
action can be applied and, therefore, the goals cannot be achieved. 

This algorithm can be used to improve the heuristic information extracted 
from the relaxed plans. Section 3.1 shows an example to compare the traditional 
RPG with our proposal. Moreover, the computational complexity of the algo- 
rithm is polynomial since the traditional one is proved to be polynomial [8], 
and the additional calculations (action and proposition costs) can be done in 
polynomial time. 

3.1 Example of the RPG Expansion 

We will illustrate the RPG expansion through a Logistics- like example problem 
(see figure 3). In this problem, there are three cities (Ci, C 2 and C 3 ), one truck 
(T) and one package ( P ). The truck and the package are initially located in C\ 
and Ci respectively. The distance between the cities is shown in figure 3 through 
the labels in the roads. The goal is to carry the package to city C3, minimizing the 
driven distance (the problem metric is: minimi z e(dr iv en)) . Table 1 summarizes 
the domain description. 

The RPG for this problem is shown in table 2. This RPG has more levels than 
the traditional one, but the number of actions and propositions per level is lower. 
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Fig. 3. Initial state in the logistics example problem 



Table 2. Relaxed planning graph for the example problem 



Po 


Ao 


Ps+s 


As+e 


Pl5+2e 


Ai5+2e 


at T Ci 
at P C 2 


Drive Ci C 2 
Drive Ci C 3 


at T C 3 


Drive C 3 C\ 
Drive C 3 C 2 


at T C 2 


Drive C 2 Ci 
Drive C 2 C 3 
Load C 2 


Pl5+3e 


Al5+3 S 


Pl5+4e 


Al5+4s 


P20+5e 




in P T 


Unload C 1 
Unload C 2 
Unload C 3 


at P Cl 


Load Ci 


at P C 3 





The first action level (A 0 ) contains the actions that are directly executable: drive 
from Ci to C 2 and C3. The cost of having truck T in C3 is 5 (since this is the 
distance between C\ and C3), so the action effects (at T C3) will be programmed 
with the cost 5 + e. Actions which do not affect the metric, like the load and 
unload operations, have a cost of e. The expansion finishes when the goal (at P 
C3) is achieved. 

The relaxed plan obtained using our RPG is the following: 

C3 C2 — > Load C2 — > Unload C3 

The following plan has been computed by means of the FF’s relaxed plan 
extraction algorithm [7]: 

P2 = {Drive C\ C'2, Drive C\ C3} — * Load C 2 — * Unload C3 

The benefits of our proposal can be easily observed just comparing both 
relaxed plans. It can be observed that PI is almost executable (it only needs one 
action to go from C2 to C3). Plan P 2, however, has hard conflicts since actions 
Drive C\ C2 and Drive C\ C3 are mutually exclusive. Moreover, action Drive 
Ci C2 should not be included in the plan because of its high cost. Obviously, 
if the planner only takes into account the relaxed plan length to compute its 
heuristics, this technique does not bring many advantages. On the contrary, 
planners that use the information provided by relaxed plan will find a valuable 
help building the final plan. 
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4 Results 

The RPG expansion for handling numeric criteria has been implemented in the 
SimPlanner v3 planner [11]. SimPlanner v3 is a heuristic planner that can take 
advantage of the presented improvements on the relaxed planning graphs. In 
other heuristic planners that only use the relaxed plan length, like FF or HSP, 
the extra valuable information from the RPG is not fully exploited. 

Our proposed expansion and the traditional one are compared in this section. 
Both techniques are implemented in the same planner ( SimPlanner v3). This 
way, the results are not influenced by other characteristics of the planner. For 
this reason, we have not included comparisons between SimPlanner v3 and other 
heuristic planners, since the results would not provide reliable information about 
the performance of our proposal. 

The domains used in this comparison are numeric domains introduced in 
the third international planning competition (IPC’02). The domains used in the 
competition are described at 

http : / / planning . cis . strath . ac . uk/compet it ion . 

Table 3 shows the results obtained in this comparison for the problems in 
DriverLog , ZenoTravel and Depots numeric domains. The DriverLog is a varia- 
tion of Logistics where trucks need drivers. Drivers can move along different 
road links than trucks. The optimization criterion in DriverLog is to mini- 
mize an instance-specific linear combination of total time, driven distance and 
walked distance. ZenoTravel is a transportation domain, where objects must be 
transported via aeroplanes. The optimization criterion is to minimize an 
instance-specific linear combination of total time and fuel consumption. The 
Depots domain is a combination of Logistics and Blockworld domains, where 
some blocks must be transported with trucks between depots and arranged 
in a certain order. The optimization criterion is to minimize the overall fuel 
consumption. 

Results on table 3 show the plan quality of the solutions, according to the 
problem metric defined, and the running times. These results show that our 
proposal improves the plan quality in most of the problems, and solves more 
problems than using the traditional approach. On average, the computed plans 
are 1.87, 1.25 and 1.23 times better in the DriverLog , ZenoTravel and Depots 
domains respectively. However, due to the heuristic nature of SimPlanner v3, 
there are a few problems where the traditional approach obtains better plans. 

Regarding the running times, table 3 shows that our proposal takes more 
time than the traditional one. This is mainly due to three factors: 

— The Application of Formula (1) to Estimate the Actions Costs. 

However, in these domains this computation has only a slight effect on the 
running time since the costs are static, that is, they do not depend on the 
state. And, for the same reason, the estimated costs for these domains are 
only computed once in the planning process. 
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Table 3. Comparison between the results obtained with the proposed expansion and 
with the traditional one (in the form proposed / traditional) for the numeric Driver- 
Log , ZenoTravel and Depots domains. Quality depends on the problem metric (greater 
numbers stand for more costly plans), and time is measured in seconds 





DriverLog 


ZenoTravel 


Depots 


Prob. Quality 


Time 


Quality 


Time 


Quality 


Time 


1 


777/777 


0.01/0.01 


13564/13564 


0.01/0.01 


32/42 


0.01/0.01 


2 


999/1625 


0.08/0.13 


6786/6786 


0.01/0.01 


43/53 


0.04/0,03 


3 


1406/1406 


0.03/0.03 


6758/6758 


0.01/0.01 


29/29 


0.22/0.15 


4 


1119/986 


0.05/0.05 


27000/27000 


0.01/0.01 


64/50 


0.54/0.44 


5 


1056/1270 


0.05/0.05 


3978/3978 


0.02/0.02 


80/259 


0.92/2.54 


6 


2095/2466 


0.03/0.02 


25097/25097 


0.03/0.03 


313/313 


205.2/195.5 


7 


1876/1476 


0.07/0.04 


11198/11198 


0.02/0.02 


37/57 


0.1/0.1 


8 


3418/3472 


0.17/0.09 


29677/55480 


0.04/0.04 


43/43 


0.43/0.43 


9 


4091/9144 


1.43/1.53 


13275/12644 


0.14/0.06 


231/409 


88.47/79.92 


10 


241/3211 


0.07/0.26 


177368/175218 


0.36/0.2 


27/27 


0.27/0.27 


11 


753/738 


0.18/0.07 


25505/65093 


0.15/0.05 


229/196 


8.35/13.05 


12 


6713/5346 


5.86/0.7 


52206/38547 


0.2/0.08 


299/389 


34.09/27.54 


13 


2886/3556 


0.83/0.76 


112468/95657 


0.55/0.22 


27/27 


1.93/2 


14 


11153/- 


2.22/- 


430417/183591 


3.51/0.27 


43/43 


2.4/2.42 


15 


3561/3464 


2.79/0.59 


59835/205701 


1.92/1.65 


237/214 


28.21/34.35 


16 


16829/39771 


110.6/24 


64119/87530 


4.14/1.48 


31/32 


0.29/0.27 


17 


20655/93031 


60/27.7 


190845/384645 


36.94/12.36 


29/31 


0.53/0.53 


18 


70917/82266 


283.4/91.3 


67610/71944 


20.42/7.26 


108/127 


37.4/36.73 


19 


-/- 


V- 


235378/257104 


44.46/15.2 


48/48 


4.65/4.7 


20 


11555/27884 


462/70.7 


111671/355711 


26.91/20.21 


217/- 


28.46/- 




1.87 better 


4.26 slower 


1.25 better 


2.36 slower 


1.23 better 1.03 slower 




The Number 


of Graph Levels. This 


number is 


from 10 


to 70 times 



greater in our proposal, although the number of actions per level is lower. 

— The Number of Actions in the Graph. This number is often greater 
than in the traditional approach due to the expansion of many unuseful low- 
cost actions in first place. In the DriverLog domain, for example, a lot of 
unnecessary walk actions are inserted in the graph. This is the main reason 
for the greater running times in most of the problems. 

These three factors slightly increase the time consumed in the RPG creation. 
However, the overall time increment is more significant as SimPlanner v3 has to 
build many RPGs when solving a problem. For example, the number of created 
RPGs is greater than 10 5 for some problems. 



5 Conclusions and Future Work 

In this paper, we have presented an extension to the traditional relaxed plan- 
ning graph generation. A relaxed planning graph is based on a GraphPlan-like 
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expansion, where delete effects are ignored. The use of these graphs is widely 
used in many heuristic planners. 

The proposed extension allows to take into account the optimization criterion 
(or metric) of the problem. During the graph expansion, an estimate of the cost 
of the actions is computed according to the problem metric. This estimate is 
used to expand the less costly actions in first place. Practical results show that 
our proposal, in general, obtains better solutions than the traditional approach. 

The relaxed planning graphs are the starting point for the heuristic estimators 
of many planners. Therefore, all the improvements on these graphs can help 
the planners increase their performance. Metric-FF [7], for example, proposes 
an extension to deal with the numeric effects of the actions. The Metric-FF 
expansion is completely compatible with our proposed expansion and, therefore, 
both techniques could be implemented in the same planner. However, there are 
several features that have not been addressed in the relaxed planning graph 
framework yet. Handling probabilities and sensing actions, for example, could 
allow the planners to face problems with uncertainty more efficiently. 
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Abstract. In constraint satisfaction, a general rule is to tackle the hard- 
est part of a search problem first. In this paper, we introduce a parameter 
(t) that measures the constrainedness of a search problem. This parame- 
ter represents the probability of a problem being feasible. A value of r = 0 
corresponds to an over-constrained problem and no states are expected 
to be solutions. A value of r = 1 corresponds to an under-constrained 
problem and every state is a solution. This parameter can also be used 
in heuristics to guide search. To achieve this parameter, a simple random 
or systematic sampling is carried out to compute the tightnesses of each 
constraint. New heuristics are developed to classify the constraints from 
the tightest constraint to the loosest constraint and to remove redundant 
constraints in constraint satisfaction problems. These heuristics may ac- 
celerate the search due to inconsistencies can be found earlier and the 
absence of such redundant constraints eliminate unnecessary checking 
and save storage space. 

Keywords: Constraint Satisfaction Problems, constrainedness, heuris- 
tics. 



1 Introduction 

Many real problems in Artificial Intelligence (AI) as well as in other areas of 
computer science and engineering can be efficiently modeled as Constraint Sat- 
isfaction Problems (CSPs) and solved using constraint programming techniques. 
Some examples of such problems include: spatial and temporal planning, quali- 
tative and symbolic reasoning, diagnosis, decision support, scheduling, hardware 
design and verification, real-time systems and robot planning. 

These problems may be soluble or insoluble, they may be hard or easy. How 
to solve these problems have been the subject of intensive study in recent years. 
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Some works are focused on the constrainedness of search. Heuristics of mak- 
ing a choice that minimises the constrainedness can reduce search [4] .The con- 
strainedness ’’knife-edge” measures the constrainedness of a problem during 
search [12]. 

Furthermore, some other works try to reduce a CSP by identifying and re- 
moving redundant constraints [10] since a constraint that allows all the possible 
value assignments of the variables on which it is defined, none of the constraints 
tuples has to be checked. Thus, the absence of such a constraint eliminates un- 
necessary checking and saves storage space [7]. However, identifying redundant 
constraint is hard, in general [10]. 

However, most of the work is focused on general methods for solving CSPs. 
They include backtracking-based search algorithms. While the worst-case 
complexity of backtrack search is exponential, several heuristics to reduce its 
average-case complexity have been proposed in the literature [3]. For instance, 
some algorithms incorporate features such as ordering heuristics. Thus, some 
heuristics based on variable ordering and value ordering [8] have been devel- 
oped, due to the additivity of the variables and values. However, constraints are 
also considered to be additive, that is, the order of imposition of constraints does 
not matter; all that matters is that the conjunction of constraints be satisfied 
[1]. In spite of the additivity of constraints, only some works have be done on 
constraint ordering heuristic mainly for arc-consistency algorithms [11,5]. 

Here, we introduce a parameter that measures the ’’constrainedness” of the 
problem. This parameter called r represents the probability of a problem being 
feasible and identify the tightnesses of constraints. This parameter can also be 
applied in heuristics to guide search. To achieve this parameter, we compute 
the tightnesses of each constraint. Using this tightnesses, we have developed 
two heuristics to accelerate the search. These heuristics perform a constraint 
ordering and redundant constraints removal. They can easily be applied to any 
backtracking-based search algorithm. 

The first one classifies the constraints by means of the tightnesses, so that 
the tightest constraints are studied first. This is based on the principle that, 
in goods ordering, domain values are removed as quickly as possible. This idea 
was first stated by Waltz [13] ” The base heuristic for speeding up the program is 
to eliminate as many possibilities as early as possible ” (p. 60). An appropriate 
ordering is straightforward if the constrainedness is known in advance. However 
in the general case, a good classification is suitable to tackle the hardest part of 
the search problem first. 

The second heuristic is based on the idea of reducing a CSP into an ’’easy 
problem” by removing redundant constraints [10]. 

In the following section, we formally define a CSP and summarize some 
heuristics. In section 3, we define a well-known definition of constrainedness 
of search problems. A new parameter to measure the constrainedness of a prob- 
lem is developed in section 4. In section 5, we present our constraint ordering 
heuristic. Section 6 summarizes the conclusions and future work. 
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2 Definitions and Heuristics 

In this section, we review some basic definitions as well as heuristics for constraint 
ordering, constrainedness and redundant constraints removing for CSPs. 

2.1 Definitions 

Definition 1. A constraint satisfaction problem (CSP) consists of: 

— a set of variables V = {vi,V 2 , ..., v n } 

— each variable € V has a set D Vi of possible values (its domain) . We denote 
d Vi the length of domain D Vi . 

— a finite collection of constraints C = {ci, C 2 , c*,} restricting the values that 
the variables can simultaneously take. 

Definition 2. A solution to a CSP is an assignment of values to all the variables 
so that all constraints are satisfied. 

Definition 3. A redundant constraint is a constraint that can be removed without 
changing the solutions. 

There is not a broadly accepted definition of constrainedness. We adopt the 
following definition of constrainedness: 

Definition 4 ■ The constrainedness of a problem is a predictor of computational 
cost to find a solution. 

2.2 Heuristics 

The experiments and analyses by several researchers have shown that the or- 
dering in which variables and values are assigned during the search may have 
substantial impact on the complexity of the search space explored. In spite of the 
additivity of constraints, only some works have be done on constraint ordering. 

Wallace and Freuder initiated a systematic study to identify factors that 
determine the efficiency of constraint propagation that achieve arc-consistency 
[11]. Gent et al. proposed a new constraint ordering heuristic in AC3, where the 
set of choices is composed by the arcs in the current set maintained by AC3 [5]. 
They considered the remaining subproblem to have the same set of variables as 
the original problem, but with only those arcs still remaining in the set. 

On the other hand, many other problems may contain redundant constraints. 
Edward Tsang in [10] proposed the possibility of reducing a CSP to an ’’easy 
problem” by removing redundant constraints. However, redundancy is in gen- 
eral difficult to detect; but some redundant constraints are easier to identify 
than others. In [2], which focuses on binary CSPs, a number of concepts for 
helping to identify redundant binary constraints are introduced. For example, a 
constraint in a binary CSP can be removed if it is path-redundant (definition 
3-19 [10]). The time complexity of path-redundant is 0(nd 3 ). However, since not 
every problem can be reduced to an easier problem, one should judge the like- 
lihood of succeeding in reducing the problem in order to justify the complexity 
of procedures such as patlr-redundant. 
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Heuristics of making a choice that minimises the constrainedness of the re- 
sulting subproblem can reduce search over standards heuristics [4]. Walsh stud- 
ied the constrainedness ’’knife-edge” in which he measured the constrainedness 
of a problem during search in several different domains [12]. He observed a 
constrainedness ’’knife-edge” in which critically constrained problems tend to 
remain critically constrained. This knife-edge is predicted by a theoretical lower- 
bound calculation. Many of these algorithms focus their approximate theories on 
just two factors: the size of the problems and the expected number of solutions 
which is difficult to obtain. 

In [4], Gent et al. present a parameter that measures the constrainedness of 
an ensemble of combinatorial problems. They assume that each problem in an 
ensemble has a state space S with |Sj elements and a number, Sol of these states 
are solutions. Any point in the state space can be represented by a N-bit binary 
vector where N = log 2 (\S\). Let (Sol) be the expected number of solutions 
averaged over the ensemble. They defined constrainedness, k, of an ensemble by, 



K — def 1 



log 2 ((Sol)) 

N 



(1) 



However, this parameter defines the constrainedness of constraint satisfaction 
problems in general, but not of an individual problem. 



3 Constrainedness r 

In this section, we introduce a parameter called r that measures the constrained- 
ness of the problem. This parameter represents the probability of a problem being 
feasible. This parameter lies in the range [0, 1]. A value of r = 0 corresponds to 
an over-constrained and no state is expected to be a solution ((Sol) = 0). A value 
of r = 1 corresponds to an under-constrained and every state is expected to be 
a solution ((Sol) = Tl t , e y d v ). This parameter can also be used in a heuristic to 
guide search. To this end, we take advantage of the tightnesses of each constraint 
to classifying them from the tightest constraint to the loosest constraint and to 
removing some redundant constraints. Thus, a search algorithm can tackle the 
hardest part of the problem first with a lower number of constraints. 

As we pointed out, a simple random or systematic sampling is performed to 
compute r, where there is a target population (states), and a sampled population 
is composed by s(n ) random and well distributed states where s is a polynomial 
function. 

As in statistic, the user selects the desired precision by the size of the sample 
s(n). We study how many states st, : sti < s(n) satisfy each constraint Cj (see 
Figure 1). Thus, each constraint Cj is labeled with p Ci : Ci(p Ci ), where p Ci = 
sti/s(n) represents the proportion of possible states, that is, the tightnesses of 
the constraint. 

In this way, given the set of probabilities {p Cl , ..., p Ck }, the number of solutions 
can be computed as: 
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Selected states by a simple random or systematic sampling s(n) 




state, state 2 state 3 state s(n) 
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Constraint checking with selected states 



Fig. 1 . From non-ordered constraint to ordered constraint 



(Sol) := (H d v ) X ( n (Pci)) (2) 

vev aec 

This equation is equivalent to the obtained in [4]. However, our definition of 
constrainedness is given by the following equation: 

t = n 

aec 

t is a parameter that measures the probability that a randomly selected state 
is a solution, that is, the probability this state satisfies the first constraint (p Cl ), 
the second constraint (p C2 ) and so forth, the probability this state satisfies the 
last constraint ( p Ck ). Thus, this parameter lies in the range [0, 1] that represent 
the constrainedness of the problem. 

We present the pseudo-code of computing r. 



Computing the constrainedness r 

Inputs: A set of n variables, vi,...,v n ; 

For each Vi, a set D; of possible values (the domain) 

A set of constraints, ci, ...,Cfc. 

Outputs: The constrainedness r. 

1. - From the number of states generated by the Cartesian product of the variable 
domain bounds, a random and well distributed sample with s(n) states is selected. 

2. - With the selected sample of states s(n), we compute how many states sti : sti < 
s(n) satisfy each constraint Ci,i = t..k. Thus, Ci is labelled with p Ci = sti/s(n). 

3-- t := El Ci ec(Pci) 
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4 Some Heuristics Using r 

To compute r, it is necessary to obtain the tightnesses of each constraint, repre- 
sented by the following set of probabilities {p Cl , ■ • ■ , p Ck } ■ We can take advantage 
of this information to develops some heuristics to guide search or to improve 
efficiency. 

The first heuristic is committed to classify the constraints so that a search 
algorithm can manage the hardest part of a problem first. Figure 2 shows the 
constraints in the natural order and classified by tightnesses. If the tightest 
constraints are very constrained (r « 0), the problem may be over-constrained. 
However, if these tightest constraints are under-constrained (r « 1) then, the 
problem will be under-constrained. 




Fig. 2. From non-ordered constraints to ordered constraints: Constrainedness 



The second heuristic is focused on constraint removing in random CSPs. 
The loosest constraints are analyzed and the redundant ones are removed of the 
problem. 

Let’s see these two heuristics. 

4.1 Constraint Ordering Heuristic 

This easy heuristic takes advantage of r making use of the probabilities of 
constraints p Cl ,p C2 , ...,p Ck . This heuristic classifies the constraints in ascend- 
ing order of the labels p Ci so that the tightest constraints are classified first 
Pc ordl , Pc ord2 Pc ordk (see Figure 2) . 

Thus, a backtracking-based search algorithm can tackle the hardest part of 
a search problem first and inconsistencies can be found earlier and the number 
of constraint checks can significantly be reduced. 

4.2 Redundant Constraint Removing Heuristic 

In many random problems, some constraints may be redundant. A redundant 
constraint is a constraint that can be removed without changing the solutions. 
Some redundant constraint may be easier to identify than others [10]. 
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In global constraints, to improve consistency a symbolic reasoning based on 
rewriting and redundant constraint introduction may help CSP solvers to find 
the result more directly [6]. 

However, in general, redundant constraints do not always save search effort 
[7] . For example, if a constraint allows all the possible value assignments of the 
variables on which it is defined, none of the constraints tuples has to be checked. 
The absence of such a constraint, in fact, eliminates unnecessary checking and 
saves storage space [7]. 

We will focus on this line, where our main goal is to identify redundant 
constraints in order to significantly reduce the number of constraint checks. 
Anyway, in case of global constraint, this heuristic may help to identify the set 
of redundant constraints. 

We can identify two different types of redundant constraints: 

— Constraints that are made redundant by any other single constraints (e.g., 
x + y < 10 is made redundant by x + y < 5) . 

— Constraints that due to their topology satisfy all possible assignments (e.g., 
x + y < 5 with domains x : {1,2} and y : { 1, 2}). 

The first type of constraints is not directly related to r since the tightnesses 
p Ci of each constraint Cj is not so relevant to identify redundancy. 

The second type of constraints may be identified by the tightnesses of con- 
straints. Constraint with p c = 1 may be redundant due to all the random selected 
states in the sample satisfy this constraint. The coincidence can be given that all 
the selected states are consistent and however not to be a redundant constraint. 
So, we identify some types of constraints that can be removed if p c = 1 and they 
satisfy a simple formula. 

The main type of constraints is arithmetic constraints of the form: 



n 

y oiiVi < 7 
2=1 

Each constraint q in the form (6) with p Ci = 1 can be eliminated if: 



n ( fj t = Df Oti > 0 

y Pm <7 ; <a=o ^ = o 

*= i [fii = D~ on < 0 



( 4 ) 



( 5 ) 



where D~ and D + correspond to the lower and upper variable domain bound. 
In this way, all constraints with p c = 1 satisfying the above formula can be 
removed. The absence of such constraints eliminate unnecessary checking and 
save storage space [7]. 



5 Evaluation of r and Heuristics 

In this section, we evaluate our parameter r and both heuristics. To estimate the 
constrainedness of random problems we compare r with the actual constrained- 
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Table 1. Random instances <n,c,d >, n:variables, cxonstraints and d:domain size 



Problems 


actual con- 
strainedness 


Parameter 

T 


Number of 
States 


Number of 
Solutions 


Number of 
Estimated Sol. 


% 

Error 


< 3,5,5 > 


0.09 


0.07 


125 


11.2 


8.7 


2% 


< 3,5,10 > 


0.05 


0.043 


1000 


50 


43 


0.7% 


< 3, 10,5 > 


0.024 


0.013 


125 


3 


1.6 


1.12% 


< 5, 5, 5 > 


0.04 


0.038 


3125 


125 


118.7 


0.2% 


< 5, 10,5 > 


0.008 


0.01 


3125 


25 


31.2 


0.19% 


< 5,10,10 > 


0.0045 


0.0034 


100000 


453 


340 


0.1% 



ness by obtaining all solutions of random problems. To evaluate 
the constraint ordering heuristic we incorporated our heuristic to well- 
known CSP solvers: Backtracking (BT), Generate&Test (GT), Forward Checking 
(FC) and Real Full Look Ahead (RFLA) 1 , because they are the most 
appropriate techniques for observing the number of constraint checks. Finally, 
we evaluated the redundant constraint removing heuristic over random problem 
using BT. 

5.1 Evaluating r 

In our empirical evaluation, each random CSP was defined by the 3-tuple: 
< n,c,d >, where n was the number of variables, c the number of constraints and 
d the domain size. The problems were randomly generated by modifying these 
parameters. We evaluated 100 test cases for each type of problem. We present 
the average actual constrainedness by obtaining all solutions, our estimator r 
choosing a sample of s(n) = 7 n 2 states, the number of possible states, the aver- 
age number of possible solutions, the average number of estimate solutions using 
r and the error percentage. 

Table 1 shows some types of random problems. For example in problems with 
5 variables, each with 5 possible values and 5 constraints < 5, 5, 5 >, the number 
of possible states is d n = 5 5 = 3125, the average number of solutions is 125, 
so the actual constrainedness is 0.04. With a sample of 7 n 2 = 175 states, we 
obtain an average number of 6.64 solutions. Thus, our parameter r = 0.038 and 
the number of estimate solutions of the entire problem is 118.7. In this way, the 
error percentage is only 0.2%. 

5.2 Evaluating the Constraint Ordering Heuristic 

The n-queens problem is a classical search problem to analyse the behaviour of 
algorithms. Table 2 shows the amount of constraint check saving in the n-queens 
problem. 



1 BT, GT, FC and RFLA were obtained from CON’FLEX. It can be found in: 

http://www-bia.inra.fr/T/conflex/ Logiciels/adressesConflex.html 
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Table 2. Constraint check saving using GT , BT, FC and RFLA in the n-queens 
problem 





GTtGT+CO 


BT&BT+CO 


FC&FC+CO 


RFLA&; RFLA+CO 


queens 


Constraint 


Constraint 


Constraint 


Constraint 




Check Saving 


Check Saving 


Check Saving 


Check Saving 


5 


2.1 X 10 4 


2.4 x 10 2 


150 


110 


10 


4.1 x 10 11 


3.9 X 10 7 


1.4 x 10 5 


9.3 X 10 4 


20 


1.9 x 10 26 


3.6 X 10 18 


9.6 X 10 14 


6.03 x 10 11 


50 


2.4 x 10™ 


3.6 X 10 52 


3.1 x 10 44 


1.6 X 10 32 


100 


2.1 x 10 143 


2.1 x 10 106 


4.5 X 10 93 


1.8 X 10 66 


150 


5.2 x 10 219 


3.7 X 10 161 


6.8 x 10 142 


2.1 x 10 10 ° 


200 


9.4 x 10 295 


8.7 X 10 219 


9.9 x 10 198 


2.2 x 10 134 



We incorporated our constraint ordering (CO) to well-known CSP solver: 
GT+CO, BT+CO, FC+CO and RFLA+CO. Here, the objective is to find all 
solutions. The results show that the amount of constraint check saving was 
significant in GT+CO and BT+CO and lower but significant in FC+CO and 
RFLA+CO. This is due to these techniques are more powerful than BT and GT. 



Table 3. The redundant constraint removing heuristic in random instances < n,c,d > 



Problems 


Redundant 

constraints 


Constraint checks 
(entire problem) 


Constraint checks 
(filtered problem) 


Constraint checks 
saving 


< 5, 5, 5 > 


1.75 


15625 


10325 


5300 


< 5, 10,5 > 


3.5 


31250 


20312 


10938 


< 5,20,5 > 


7 


62500 


40625 


21875 


< 5,5,10 > 


1.75 


5 x 10 5 


3.3 x 10 5 


1.7 x 10 5 


< 5, 10,10 > 


3.5 


1 x 10 6 


6.5 x 10 5 


3.5 x 10 s 


< 5,20,10 > 


7 


2 x 10 6 


1.3 x 10 6 


7 x 10 s 



5.3 Evaluating the Redundant Constraint Removing Heuristic 

We evaluated this heuristic over random problems as presented in section 5.1. In 
this case, all constraints are global constraints, that is, all constraints have arity n. 
Table 3 shows for each type of constraints, the average number of 
redundant constraints ( red c ), the amount of constraint checks for the entire 
problem, the amount of constraint checks for the filtered problem (without 
redundant constraints) and the amount of constraint checks saving using 
backtracking (BT). It can be observed that the amount of constraint checks for 
the entire problem was d n c. The constraint checks for the filtered problem was 
d n (c — red c ). So, the constraint check saving was d n (red c ), that corresponds a 
saving of 35%. These formulas may be standardized for backtracking. If the 
average number of redundant constraints is known, the constraint check saving can 
be obtained by d n ( red c ) . 
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6 Conclusion and Future Work 

In this paper, we introduce a parameter (r) that measures the ’’constrainedness” 
of a search problem, r represents the probability of a problem being feasible. A 
value of r = 0 corresponds to an over-constrained problem, r = 1 corresponds 
to an under-constrained problem. This parameter can also be used in a heuristic 
to guide search. To achieve this parameter, we compute the tightnesses of each 
constraint. We can take advantage of this tightnesses to classify the constraints 
from the tightest constraint to the loosest constraint and to remove redundant 
constraints. Using the constraint ordering heuristic and the redundant constraint 
removing heuristic, the search can be accelerated due to inconsistencies can be 
found earlier and the number of constraint checks can significantly be reduced. 

Furthermore, these heuristic techniques are appropriate to solve problems as 
a distributed CSPs [9] in which agents are committed to solve their subproblems. 

For future work, we are working on integrating constraint ordering with vari- 
able ordering in centralized and distributed CSPs. 
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Abstract. The present work deals with the blocks world domain. While study- 
ing and analysing the theory about blocks world (BW) problems (BWP) an ef- 
fective algorithm to solve instances was “discovered". This algorithm solves 
optimally many instances of blocks world problems and involves no search. 
The algorithm is backed up by ideas stated in Gupta and Nau (1991,1992) [5,6] 
and in Slaney and Thiebaux (1996,2001) [9,10]; the algorithm is suitable for 
programming and could serve to build an optimal block solver without forward 
search. The ideas behind this algorithm may constitute a basis for investigating 
other problems of more practical interest like the container loading problem 
and the bin packing problem. 



1 Introduction 

Blocks world planning has been widely investigated by planning researchers, pri- 
marily because it appears to capture several of the relevant difficulties posed to 
planning systems. It has been especially useful in investigations of goal and sub- 
goal interactions in planning. The primary interactions studied have been deleted- 
condition interactions, such as creative destruction and Sussman’s anomaly [2,3,4], 
in which a side-effect of establishing one goal or subgoal is to deny another goal or 
subgoal. 

A problem in classical planning is given by a set of actions, an initial state and a 
goal. These problems can be formulated as problems of search from the initial state 
to the set of goal states by applying actions that map one state into another. The 
result of this search is a path in the state-space or a plan [7]. This problem is com- 
putationally hard and the best algorithms are bound to fail on certain classes of 
instances [1], 

One response to overcome computational difficulties is to abandon search alto- 
gether and construct a special algorithm for a planning domain, incorporating in some 
way domain specific knowledge. This algorithm can be thought of as a mapping from 
any initial state and goal state to an action to be taken towards achieving the final 
goal; this approach has been called reactive planning. The algorithm presented in this 
work follows this line of thought. 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 134-143, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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2 Blocks World Theory 

2.1 Preliminary Notions 

A fixed finite set of n blocks is given. A state is an arrangement of these blocks on a 
table where each block is on another block or on the table; no more than one block is 
on a given block and no block is on more than one block. Given an initial state I, and 
a final (or goal) state G, the problem is to describe how to go from I to G preferably 
in the least number of steps. The permissible steps are: 

1. Move onto the table a clear block , that is, one which has no block on it. 

2. Move a clear block on top of another block. 

A tower of height n of a state is a sequence (b r b 2 , ...,bj of blocks such that b, is 
clear, b n is on the table and b j is on b i+I (i=l,...,n-l ). Notation: the state s whose towers 
are, for example, [1,5,4], [2] and [6,3] will be written .?=[[ 1,5,4], [2], [6, 3]]. Notice this 
is the same as, for example, [[6,3], [1,5, 4], [2]] but not the same as [[5, 1,4], [2], [6, 3]]. 

Throughout this work we will be dealing many times with a certain group of predi- 
cates which describe a block’s position. To be more precise lets look at, and analyse 
two examples. 







1 




3 




2 


1 


2 


3 


Initial State Goal State 



Fig. 2.1. Example of a block situation 



In figure 2.1 an example of a situation can be seen (composed of an initial state 
and goal state) for the 3-blocks world. This situation could be described using a set of 
primitive predicates as follows: 

on_table/ 1) A on_table s ( 2) A clear/ 2) A on / 3,1) A clear / 3 ) 
on_tableJ3) A on/2,3 ) A on/1,2) A clear /I ) 

Figure 2.2 illustrates an example of the support predicates above /X, Y), above /X,Y) 
and well _j?laced(X). The primitive or support predicates could also be negated as 
figure 2.2 shows in the example not well _jrlaced(l), this will be usually written as 
well _placed( 1 ). 

A well _jtlaced block is a block whose initial position coincides with its goal posi- 
tion and all blocks below it are well_placed. 

If a block b is on the table in the initial and goal states then b is a well _placed 
block. 
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Fig. 2.2. Example of support predicates(wp= well_placed) 



A misplaced block is a block that is not well_placed and a misplaced tower is a 
tower whose top block is misplaced. Therefore [3,2,4] in figure 2.2 is an example of a 
misplaced tower. 

3 A Near Optimal Algorithm 

3.1 The (I-II-III)-Algorithm 

We will start by stating a therorem (Gupta and Nau, 1991) [5] which contains key 
features for our posterior analysis. 

Theorem 3.1 (Gupta’s and Nau’s Theorem) 

Let B=(I,G) be any solvable BW problem where 0 denotes the table , and P be any 
plan for B. Then there is a plan Q for B such that \Q\ < = IPI and Q has the following 
properties: 

1. For every block b whose position in I is consistent with G, b is never moved in 

Q. 

2. Q moves no block more than twice. 

3. For every block b that is moved more than once, the first move is to the table. 

4. For every block b that is moved to a location df0, on(b.d) e G. 

5. For every block b that is moved more than once, in the state immediately pre- 
ceding the first move, no block whose position is inconsistent with G can be 
moved to a position consistent with G. 

With our terminology 1 would read: 

1. A well_placed block b is never moved in a plan Q. Since the consistent notion is 
equivalent to the well_placed notion. 

According to steps 1 and 2 of theorem 3.1, to go from an initial state I to a goal 
state G following an optimal path, well_placed blocks do not need to be moved and 
no block needs to be moved more than twice. This implies that, to calculate the dis- 
tance from I to G, it suffices to find a set S of blocks of I which is precisely the set of 
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blocks which are moved exactly once in an optimal path from I to G. Such a set 5 
will be called a 1-set (with respect to ( I,G )) and an element of a 1-set a 1 -block. In 
general 5 is not unique, that is, there are examples of problems (I,G) such that there 
exist two different sets 5 and S’ with the property that 5, and also S’, is a 1-set with 
respect to (/,G). In figure 3.1 S= { 4 } is a 1-set if 3 moves to the table and S’={3} is a 
1-set if 4 moves to the table. 



3 




4 




4 




3 


1 




2 




1 




2 



Initial State 



Goal State 



Fig. 3.1. Example of 1-sets and 2-sets 



Alternately, to calculate the distance from / to G, it suffices to find a set T of 
blocks of I which is precisely the set of blocks which are moved exactly twice in an 
optimal path from 7 to G. Such a set T will be called a 2-set and an element of a 2-set 
a 2-block. Like 5, T is not unique. In figure 3.1 7 = [ 3 } is a 2-set if 3 moves to the 
table and 7”={4} is a 2-set if 4 moves to the table. 

It is easy to see that the distance from I to G is m + 171 where m is the number of 
misplaced blocks and 171 is the cardinality of a 2-set T with respect to (7,G). 

Hence the problem of calculating the distance from I to G reduces to the problem 
of calculating 171 (or 151 since 151 + 171 = in). 

Let us give now a useful definition: 

Definition. A block b is locked (with respect to (I,G)) if it is misplaced and there 
exists a block b ’ such that h is above /;' in / and b is above b ’ in G. An example is 
shown in figure 3.2. 




Fig. 3.2. Locked blocks examples 
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An important observation is that any locked block is moved twice in any path 
(=plan) from I to G; thus locked blocks are 2-blocks. In general 2-blocks are not 
locked blocks but in the special case in which I and G are towers, 2-blocks are the 
same as locked blocks and in fact the set of locked blocks is the unique 2-set. 

It is now proposed an algorithm (the (I,II,III) - algorithm) which aims at finding 
an optimal path from I to G : 

The (I-II-III)-Algorithm 

I. If there is a constructive move (a move that increases the number of 
well_placed blocks), make such a move. 

II. If there is no constructive move and there is a clear locked block, move such 
a block to the table. 

III. If there are no constructive moves and no clear locked blocks, move a clear 
block with minimal deficiency to the table. 

Let’s now start the explanation about the step III of the (I,II,III)-algorithm (that is, 
put on the table a misplaced clear block with smallest deficiency). Essentially step III 
consists in unblocking a well placed or proximal( =next-needed) block by removing as 
few blocks as possible. 

Definition. Let S be a state and G the goal state. A misplaced block b is proxi- 
;h«/( with respect to (.S'.Gj) if there exists x such that x is a clear well_placed block or 
the table, and b is on x in G. 

More informally, b is proximal(=next-needed) if, after removing the blocks above 
b, one can make a constructive move (on(b,x)). (A constructive move is one which 
increases the number of well_placed blocks). In a sense, proximal blocks are mis- 
placed but close to being well_placed, if one removes the blocks above it. 

The deficiency of a misplaced clear block b def(b) is defined as follows: 

Definition. Let S be a state, G the goal state and b a misplaced clear block of S which 
is not proximal. If no blocks below b are well_placed or proximal define def(b)= oo. 

Otherwise def(b ) = n-r where n and r are as follows: 

1. bj,...,b n are n blocks such that bj = b and b ) is on b (]+1j in S (j=l,...,n-l ). 

2. bj,...,b n are neither well_placed nor proximal. 

3. b n is on a well_placed or proximal block in S. 

4. r is the number of blocks in b h ...,b n which are locked. 

In other words, the deficiency of b, in case it is not oo, is the number of blocks that 
are needed to remove from the tower to which b belongs, to reach a well_placed or 
proximal block, with the proviso that locked blocks do not count. 

Step III now reads: If no constructive moves can be made and no clear block is 
locked then put a misplaced clear block of smallest deficiency on the table. 

Notice that the (I-II-III)-algorithm is complete since a misplaced clear block will 
always have a deficiency associated with it. 

The following is equivalent to theorem 3.1 (Gupta and Nau). 
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There is an optimal path(=plan) from I to G in which the following rules are 
respected: 

1) Never move a well_placed block. 

2) If a constructive move (a move that increases the number of well_placed 
blocks) can be made, make it; if not move a misplaced block to the table. (No- 
tice that 2 implies 1). 

One can see that in a plan respecting 1) and 2) no block is moved more than twice. 

This theorem justifies step I in the (I-II-III)-algorithm. If in a state more than one 
constructive move can be made it does not matter which one is chosen. However, the 
main problem is: 

In order to build a near optimal plan that is optimal for many blocks world prob- 
lems, if no constructive move can be made, which block must be moved to the table?. 
This is answered by steps II and III of the (I,II,III) algorithm. 





17 




12 


18 


2 


19 


15 


3 


14 


13 


16 


1 


8 


11 


5 


9 


7 


10 


4 


6 


Goal State 





11 




19 


10 


18 


5 


9 


17 


1 


4 


8 


16 


12 


14 


7 


3 


13 




6 


2 


Initial State 



Fig. 3.3. A 17-block example 

When applying algorithm I-II-III to the example shown in figure 3.3, the following 
analysis is conceived: 

Since there are no constructive moves step I can not be used; step II can not be ap- 
plied either since there are no locked blocks. Therefore step III is our last hope, and it 
solves the problem since the deficiencies of the blocks 1,11,9 and 19 are oo, 1,2, and 
oo respectively, therefore step III suggests to move 1 1 to the table since it is the block 
with lowest deficiency (1). The rest of the moves are constructive moves (step I) and 
the number of moves it takes to go from the Initial State to the Goal State is 18 which 
is optimal. In fact the optimal path is unique. 

3.2 The (I-II)-Algorithm 

This section shows that problems where the initial or goal state has less than two mis- 
placed towers of height > 1 are solved by the (I-II)-Algorithm (defined below) and are 
therefore easy. Also the distance from / to G in such problems is the sum of the num- 




140 



A.G. Romero and R. Alquezar 



ber of misplaced blocks and the number of locked blocks. Notice that problems where 
the initial or goal state consists of a single tower have this property. 

Definition. A problem (I,G) will be called critical if no constructive moves can be 
made and no clear block in I is locked. 

Proposition 1. If (I,G) is critical then both I and G have at least two misplaced towers 
of height > 1 . 

Proof. Since no constructive moves can be made, by Slaney and Thiebaux (1996) page 
1210 [9], there is a clear deadlock (m 0 , m I ,...,m n ) in (I,G). This means that the blocks 
nij are clear in I and misplaced, m 0 =m„ and BelowjG(mi) intersects Below_I(m i+I ) 
(i=0,l, ...,n-l) where Below_S(b) is the set of blocks below b in S. This implies that 
no m, is on the table in S and no m, is on the table in G. 

We claim that m,L- m i+1 . If we had m, = m i+1 then Below_G(mj) would intersect Be- 
low_I(mj), which is impossible because m, is not locked. Hence m, L- nii+i ■ 

Therefore the tower whose top is m, and the tower whose top is m i+1 are two differ- 
ent misplaced towers of height > 1 . Thus I has the desired property. 

Now, suppose G has less than two misplaced towers of height > 1. Then all m. be- 
long to a single tower T of G. We claim that m. is above m i+l in G (i=0,l, Let 

b be a block below m. in G and below m M in I. Clearly b is in T. The block m n l cannot 
be above b in G since m , is not locked. Hence m , is below b and, therefore, below 
m. in G. Thus, in G, m I is above m n . But this is impossible because m=m ii . Therefore 
G has at least two misplaced towers of height > 1 . □ 

The (I-II)-Algorithm 

I. If there is a constructive move (a move that increases the number of 
well_placed blocks), make such a move, 

II. If there is no constructive move and there is a clear locked block, move such a 
block to the table. 

This algorithm is not complete in general, of course, because it gets stuck when 
one arrives at a state I such that (I,G) is critical. Proposition 1 states that this never 
happens if the initial or goal state has less than two misplaced towers of height > 1 (in 
particular this holds if I or G consists of a single tower). 

Proposition 2. Let (I,G) be a problem such that I or G has less than two misplaced 
towers of height > 1 . Then the (I-II)-algorithm finds an optimal path from I to G and 
the distance from I to G is: (number of misplaced blocks)+(number of locked blocks). 

By proposition 1, (I,G) is not critical and therefore the (I-II)- algorithm can be ap- 
plied initially. But notice that if this algorithm is applied to a state with less than two 
misplaced towers of height > 1, the resulting state has the same property. It follows 
that the algorithm can be applied at any stage. Since the moves of the (I-II)-algorithm 
are good, that is, they get us closer to G, we eventually reach the goal following an 
optimal path. The number of moves of this path is the number of constructive moves 
+ the number of locked blocks. This is the distance from I to G. 
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3.2 The 5-Blocks World 

Most of the 5-block world problems are solved using the (I-II)-algorithm since their 
initial or goal state has less than two misplaced towers of height > 1 . It is then desired 
to find problems with 5 blocks in which no step I (=constructive) or step II (Mocked) 
move is possible. 

It can be assumed, by renumbering the blocks if necessary, that any problem with 
5 blocks is equivalent to one in which the goal state is one of the following seven: 

1. [1,2, 3,4,5] 

2. [ [1, 2,3,4], [5] ] 

3. [ [1,2,3], [4], [5] ] 

4. [ [1,2], [3], [4], [5] ] 

5. [ [1], [2], [3], [4], [5] ] 

6. [[1,2], [3,4], [5]] 

7. [ [1,2,3], [4,5] ] 

If the goal is one of the first five, there is always a step I or a step II move 
available. 

If the goal state is [ [ 1,2], [3,4], [5] ] and the initial state is one of : 

1. [[3,2], [1,5,4]] 

2. [ [3,2], [1,4,5] ] 

3. [ [3,2], [1,4], [5] ] 

then no step I or step II move is possible. 

If the goal state is [ [1,2,3], [4,5] ] and the initial state is one of : 

4. [ [4,3,2], [1,5] ] 

5. [ [4,3,1], [2,5] ] 

6. [ [2,5,1], [4,3] ] 

7. [ [2,1,5], [4,3] ] 

8. [[2,5], [4,3], [1]] 

9. [ [1,5], [4,3], [2] ] 

10. [ [1,5], [4,2], [3] ] 

then no step I or step II move is possible. 

These are the only situations with 5 blocks in which no step I or step II move is 
possible. In any of them any step III move is optimal. Thus the (I-II-III)-algorithm is 
100% effective in solving optimally the 5-blocks problem. 

4 Conclusions and Future Work 

The (I-II-III)-algorithm can solve all 5 block world problems (bwp) in an optimal 
fashion. It can also solve optimally many other classes of bwp as can be seen from the 
example previously shown consisting of 17 blocks. To solve examples where the ini- 
tial state, or the goal state, have less than two towers of height greater than one, it is 
sufficient to employ the (I-II)-algorithm. 
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Like it was done for Slaney’s and Thiebaux’s algorithm (GN2 with A sequence) in 
[10] the (I-II-III) algorithm can be programmed to build a near optimal Blocks world 
solver. Once built this algorithm could be tested using BWSTATES, a program built 
by Slaney and Thiebaux which generates random BW states with uniform distribution 
(see [10] pages 122-128). 

Besides being an optimal algorithm for a large number of instances, it enlightens 
the crucial key ideas that should be learned for this domain. In particular it helps to 
know that certain rules, when fired, not only approach the goal, but approach opti- 
mally to it, for instance putting clear locked blocks on the table. 

BW’s place as an experimentation benchmark constitutes a basis for investigating 
other problems of more practical interest. The abstract problem of which BW is an 
instance is the following: sets of actions producing the goal conditions (constructive 
actions) cannot be consistently ordered so as to meet all of their preconditions (that is, 
step II and step III actions are needed). To resolve each locked block situations and 
minimal deficiency block situations, a number of additional actions have to be intro- 
duced (non-constructive actions (step II and step III)). Since these situations are not 
independent, there is a need to reason about how to resolve them all using as few 
additional actions as possible. This core problem is present in other more realistic 
situations. For instance, moving blocks is not very different from moving more excit- 
ing objects such as packages, trucks and planes. Again, the version of BW in which 
the table has a limited capacity captures the essence of the container loading problem, 
a problem that is crucial to efficiency of freight operations [8,11]. Therefore, finding 
the right generalisations of strategies that are effective for BW appears promising as 
an approach to more sophisticated problems. 
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Abstract. When a Genetic Algorithm is used to tackle a constrained 
problem, it is necessary to set a penalty weight for each constraint type, 
so that, if the individual violates a given constraint it will be penalized 
accordingly. Traditionally, penalty weights remain static throughout the 
generations. This paper presents an approach to allow the adaptation 
of weights, where the penalty function takes feedback from the search 
process. Although, the idea is not new since other related approaches 
have been reported in the literature, the work presented here considers 
problems which contain several kinds of constraints. The method is suc- 
cessfully tested for the congress timetabling problem, a difficult problem 
and with many practical applications. Further analysis is presented to 
support the efficiency of the technique. 



1 Introduction 

Genetic Algorithms (GAs) [1, 2] are an optimization technique that has shown 
very good performance for a variety of combinatorial and constrained optimiza- 
tion problems [3]. Individuals in a population are evaluated by a fitness function, 
which along with other operators such as selection, crossover and mutation, guide 
the search to promising regions of the solution space. For a constrained prob- 
lem, the fitness function usually is represented as a penalty function in which the 
penalty weights for each type of constraint are set at the beginning and remain 
static throughout the generations. The penalty function computes the degree of 
constraint violation for a solution, and this is used to penalize each individual 
accordingly. 

The main drawback of this approach, however, is precisely the arbitrary selec- 
tion of these coefficients. Usually, these weights are set in regards to the impor- 
tance of the constraints. A different alternative is to run a series of experiments 
in order to tune the weights. But, what is the problem when proposing these 
weights at the beginning? It is not necessarily true that setting a high penalty 
for a constraint would mean that the final solution will satisfy that constraint; 
sometimes it happens that giving a low penalty to an important constraint will 
result in a solution in favor of that constraint. Would it be possible to forsee 
all these cases? There is an alternate path for tackling these problems by using 
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a set of techniques to adapting weights incorporating feedback from the search 
process. A complete survey of these strategies are presented and summarized 
by Miclralewicz and Schmidt [4]. The article addresses issues about the general 
nonlinear programming problem and the emerging approaches for solving it. The 
idea is to adjust the penalties according to the difficulty of the given constraint 
in a given stage of the solution process. Usually, constraints can be of two kinds: 
equalities and inequalities. In general terms, inequalities are easier to solve. For 
instance, if a variable x requires to be assigned a value less or equal to zero, then 
its domain is [— oo,0]. In the contrary, for an equality a specific value has to be 
assigned to a given variable. 

Let us assume we have constraint R\. When a solution Z satisfies that con- 
straint, then Z is feasible with respect to the constraint. Otherwise Z is con- 
sidered infeasible. In evolutionary computation, the fitness function serves as a 
way for rating individuals (solutions) in the population. Some individuals may 
be feasible whereas others are infeasible. A percentage of feasible solutions can 
be computed for a particular population and this can be used (1) to detect the 
hardness of a given constraint, that is, the less feasibility the more difficulty of 
that constraint; and (2) to keep a percentage of feasible individuals for a given 
constraint, adjusting that percentage by applying genetic operators. It is known 
that usually an optimal solution for a problem is surrounded by a number of 
infeasible solutions, or sometimes the optimal solution can be found around the 
boundary between feasible and infeasible regions [5] . Hamida [6] has proposed a 
technique which is based on the feasibility percentage. This idea was extended in 
this paper to solve a more constrained problem such as the congress timetabling. 

The goal of the work presented here is to introduce adaptability of penalty 
weights to a highly constrained problem with different kind of constraints, and 
by doing so, to provide a general procedure for including adaptability to a con- 
trained problem. 

This article is organized as follows. Section 2 presents the methodology followed 
to carry out this investigation. Section 3 establishes the experimental set up, the 
results and their discussion. Finally, in section 4 we include our conclusions. 



2 Methodology 

In what follows we describe the key steps followed to achieve the goal of this 
investigation. First, the Congress Timetabling Problem is defined and the im- 
plementation details of a problem generator are presented. Next, the adapted 
Genetic Algorithm, its representation and parameters are introduced. Based on 
the original algorithm for adapting penalty weights, two new algorithms were 
designed taking into account particular features of the congress timetabling prob- 
lem and its constraints. 

2.1 The Congress Timetabling Problem 

The problem of assigning events or activities to resources such as time slots, 
physical space or personnel, in such a way that various constraints are satisfied, 




146 



D.A. Huerta- Amante and H. Terashima-Marfn 



is known as the timetabling problem [7]. More formally, the problem can be 
described as a set A of events, a set D of time slots, and a set C of constraints. 
The aim is to assign the events into the time slots satisfying all the constraints 
[8]. For a more complete survey on timetabling refer to the work by Sclraerf [9]. 
For this research, a specific kind of timetabling problem was chosen, the congress 
timetabling problem with the following features: 

1. In most of the timetabling problems it is assumed that all time slots are of 
the same duration. In our problem, duration of time slots is variable. 

2. There is no event overlapping, i.e. there exists only one event for every unit 
of time. 

3. The domain of an event consists in a set of minutes, instead of a time lapse. 

4. The constraints that are considered in this work are PRESET, EXCLUDE, 
ORDER and TIME. PRESET(x,y) establishes that event x should be sched- 
uled at exactly y hours. EXCLUDE(a:,*a) in which event x must not happen 
at certain times given in array a. ORDER(a;,w) which requires an event x 
to be scheduled just before some other event w. TIME(f) indicates that the 
sum of all scheduled events should not be greater than t. 

2.2 Generator of Congress Timetabling Problems 

A problem generator was implemented to create an experimental testbed. There 
are several parameters that were established according to features found in real 
congress timetabling problems. The number of days for a congress can be be- 
tween 3 and 5 days; duration of each day varies between 450 and 675 minutes, 
depending on the number of activities per day that can be between 15 and 35, 
and the activity duration that can be between 1 and 90 minutes. 80% of the 
activities are involved in at least one constraint. If an event is constrained by 
PRESET, then that event can not involve other kind of constraint. There is a 
probablity of 0.25 for getting a PRESET constraints, 0.45 for EXCLUDE, and 
0.30 for ORDER. 

2.3 Solving the Problem with GAs 

Given that activities must not overlap and the goal is to find a good ordering 
of events, it is natural to use a permutation-based representation. This kind of 
representation has been studied extensively in the literature [1,10]. Each chro- 
mosome represents a complete solution to the problem with n activities to be 
scheduled in the given order. Each event is assigned one by one until a day is 
complete. Three variations were tested for doing this: 

CGA1.- It sequentially assigns activities until there is one that exceeds the 
time limit of the day being filled. The next activity is then scheduled in the 
following day. 

CGA2.- Similar to CGA1, but the exceeding activity is scheduled in the 
following day. 

CGA3.- In addition to CGA2, the PRESET constraint was considered. If 
an activity involves a constraint of this kind, then it is is scheduled at the time 
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requested, if possible. Otherwise, it is shifted to the right to the first interval in 
which it fits. 

The objective function used to evaluate each chromosome x is: 



where 



f(x) 



1 

1 + P(x) 



(1) 



P\X ) — CtimePt ime “1“ CpresetP preset ”1“ CorderPorder H"~ CexcludeP exclude (2) 

where Pume, Ppreset, P order and p eX ciude are the penalty weights for each con- 
straint type, and Ctime, c preset , c or der and c exc i ude are the degrees of violation 
for each constraint. 

It is expected that the chromosome provides an order resulting in a sched- 
ule with time equal to the total duration of the congress. This is the constraint 
TIME. If not, the individual is penalized in relation to the extra minutes used. 
The PRESET constraint is penalized with the difference in minutes between the 
actual scheduled time and the time in which the event should have been sched- 
uled. The ORDER constraint is penalized by the number of activities necessary 
to skip in order to have the first activitiy before the second one. To penalize the 
constraint EXCLUDE, the number of minutes overlapping the prohibited time 
lapse was used. 

Penalty weights for the constraints were set with respect to the their impor- 
tance following the advise of a human expert in congress scheduling along with 
experimental tuning. The weight for the TIME constraint is 0.01; 0.005, 0.003 
and 0.05 for PRESET, EXCLUDE, and ORDER constraints, respectively. 



2.4 Parameter Tuning 

It is known that a GA usually needs a process of tuning to identify the pa- 
rameter set that behaves best against a set of problems. To do this, 9 random 
problems were generated, and each was tested with 15 runs for different param- 
eter configurations (population size, selection, crossover, mutation, and number 
of generations). Table 1 shows the best configuration for each GA with static 
penalty weights. Variation CGA2 performed better than the other two. This al- 
gorithm was selected to include the adapting model for penalty weights. In the 
following sections CGA2 will be renamed as CGA. 



2.5 Algorithm for Adaptive Penalties 

The original method suggested by Hamida and Schoenauer [6] for adapting 
penalty weights modifies the penalty coefficients according to the proportion 
of feasible individuals in the population. Their strategy is used to combine feasi- 
ble with infeasible individuals in order to explore the region around the feasible 
domain, and it takes into account multiplication and division operations for ad- 
justing the penalties. The static parameter fact is the constant that is used to 
multiply (to increase) or divide (to decrease) the current penalty to compute the 
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Table 1 . Final parameter set for each GA with static penalty weights 



parameter/ algorithm 


CGA1 


CGA2 


CGA3 


Selection 


Tn 


Tn 


Tm 


Tournament size 


3 


3 


4 


Crossover 


GOX 


GPX 


GPX 


Crossover probability 


0.95 


1.00 


0.90 


Mutation 


Swamp 


Order 


Order 


Mutation probability 


0.65 


0.34 


0.34 


Population size 


350 


350 


200 


Fitness average 


0.545264 


0.650261 


0.201606 



new penalty that will be used in the next generation (or in a set of generations) . 
tTarget is another parameter to determine the target portion of feasible indi- 
viduals in the population. The original technique adapted for our problem with 
several kinds of constraints was called CGAA1. 

The effect of the parameter fact is the same for adjusting the penalty, re- 
gardless how close the the proportion of feasible individuals is to tTarget. In 
other words, there is no difference in the adjustment if a constraint is about to 
meet the target portion of feasible individuals, or when it is far from it. In order 
to correct this problem the variable error was introduced, which is the difference 
between the actual portion of feasible individuals (t.T) and the target portion 
(tTarget): error = tT — tTarget. So, the two suggested rules by Hamida and 
Schoenauer are merged into one: 

cti[t + 1] = cti[t] — ( tTrain x error) (3) 

on [f] is the parameter that is adjusted depending on the error for a constraint of 
type i. If error is positive (there are more feasible individuals than the target), 
ai[t] is decreased. Otherwise, Oj[f] is increased, a,; [0] starts with a value equal to 
the penalty weight (used in the static algorithm) for constraint i. The constant 
tTrain helps to adjust the penalty in terms of the penalty units, which in this 
case are in the order of 10 -3 . This technique was called penalty adaptability based 
on error or CGAA2. 

2.6 Adaptation Parameters 

In order to improve the adapting process for penalty weights some additional 
parameters were used, for both the CGAA1 and CGAA2, having in mind the 
different types of constraints involved in the congress timetabling problem. 

Relaxation. The aim is to transform an equality into an inequality and trans- 
form this gradually back into an equality. For instance, R 2 — > x = 5 represents 
a constraint called R -2 that is limiting the domain of variable x to {5}; using 
relaxation, the domain of x is [(5 — relaxation) , (5 + relaxation )] as shown by 
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equation i ?2 — > (5 — relaxation) < x < (5 + relaxation) . To determine the ini- 
tial relaxation, it was necessary to analyze the equality constraints: TIME and 
PRESET. With respect to TIME, it was observed that some particular features 
of the problem are related to the initial behavior of TIME. A backpropagation 
neural network of 4 inputs and 1 output was used to establish the initial re- 
laxation. The inputs are the total duration of congress, the number of days in 
the congress, the number of activities in the congress, and the portion of feasi- 
ble individuals in the initial population; the output is the necessary relaxation 
that produces, for a given problem, the right portion of feasibles in the initial 
population for the TIME constraint. For the constraint PRESET a particular 
relationship between the problem characteristics and its behavior was not found, 
but the average duration of the activities was used for relaxing this constraint. 
To make sure that the given relaxion was right for this problem, this was still 
slightly adjusted. Various multiplications by constants were used. For TIME the 
constants were: 0.5, 1.0, 1.5 y 2.0. For PRESET: 0.5, 1.0, 2.0, 3.0 y 4.0. 

tMax. When the proportion of individuals is greater than tMax the relaxation 
value is reduced by 1. 

tRel. It is reasonable to think that when a solution is feasible with respect to a 
specific constraint thanks to relaxation, the individual should not be penalized, 
or at least not as hard as an infeasible solution. This parameter was introduced 
to test different alternatives for penalizing this kind of individuals: (1) penalize 
all individuals according to their actual degree of violation, (2) do not penalize 
these individuals, but penalize the infeasible in relation to the actual degree of 
violation, (3) do not penalize these individuals, but penalize the infeasible in 
relation to the difference between the actual degree of violation and the relax- 
ation, and (4) penalize these individuals with a function (relaxation / degree) 
and penalize the infeasibles with ( degree — relaxation + 1). 

Lapse. The idea behind this parameter comes from the work reported by Eiben 
for graph coloring [11]. In this model, called Stepwise Adaptation of Weights, 
the parameter is used to indicate the number of generations (or evaluations 
since there is only one individual in the population) before coloring the node 
with the highest penalty, and update the penalties for the rest of the nodes. 
Whereas this model focuses in the nodes, in our work we concentrate this effort 
in the constraints. When the adaptations are large, it is required to give some 
time (certain number of generations) to the GA to react to the adjustments. 
Lapse represents this number of generations and has a value between 1 and 25. 

tEnd. It specifies the termination criterion for adaptation. Search in a GA with 
adaptive parameters behaves well for a number of generations; however, often it 
reaches a point where the average fitness starts to decrease. It is important to 
have a parameter to stop adapting, adapting zone is a region (set of generations) 
in which the constraints behave unsteady. After passing this zone the constraints 
suffer small changes, except the constraint PRESET, and this is used in their 
advantage to find better individuals. Our conjecture is that at the end of the 
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adapting zone we should stop adapting. To detect the end of the adapting zone 
the following steps are suggested: (1) for each generation and for each constraint, 
find the variance on the behavior of the best found so far in the last 10 genera- 
tions, (2) compute the average variance for each constraint, and (3) if this average 
is less than tEnd then the adapting zone has ended. tEnd has a value between 
0.000500 y 0.000005. Behavior of this parameter for a particular instance is shown 
in Figure 1, where the grey region is the adapting zone. After experimentation, 
it was determined that this parameter was only helpful for algorithm CGAA1. 
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Fig. 1. Variance on the behavior of each constraint type during search 



3 Experimental Results and Discussion 

To introduce adaptation to the constraints in our problem, the following steps 
were used: (1) tune the parameters separately for the equality constraints (TIME 
and PRESET), (2) introduce adaptability to each of the equality constraints, (3) 
introduce adaptability to the rest of the constraints, from the hardest to the eas- 
iest, and finally, (4) tune the tEnd parameter. The parameter order for tuning 
is as follows: Lapse , relaxation, fact/t-Train, tMax and tRel (only for TIME and 
PRESET), and tTarget (just for EXCLUDE and ORDER). Three problems were 
designed (generated) for each duration of 3, 4 and 5 days. 15 runs were carried out 
for each problem. Tables 2 and 3 show the final tuning for the set of parameters. 

Results obtained with these combinations are presented in Table 4 for all 
three algorithms tested. Each algorithm reports results for problems with 3, 4 
and 5 days in the duration of the congress. 17 different problems were used for 
each duration, with 30 runs for each instance. It is observed that for constraints 
TIME, PRESET, ORDER and EXCLUDE, in both algorithms CGAA1 and 
CGAA2, the adaptive process produces a decrease in the constraint violation. 
For instance, for constraint TIME the average number of minutes is 10.00 for 
CGA (static penalty weights) in congresses lasting 4 days, while for CGAA1 
decrease to 9.67, and, for CGAA2 there is still a slight improvement decreasing 
to 6.06. In the last column of the table FITNESS is reported. In order to have 
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Table 2. Parameters for algorithm CGAA1 



parameter/constraint 


TIME 


PRESET 


ORDER 


EXCLUDE 


tMax 


0.70 


- 


- 


- 


lapse 


8 


- 


8 


8 


relaxation 


0.50 


- 


- 


- 


fact 


1.1 


- 


1.1 


1.1 


tRel 


1 


- 


- 


- 


tEnd 


0.0002 


- 


0.0002 


0.0002 


tTarget 


0.50 


- 


1.00 


1.00 



Table 3. Parameters for algorithm CGAA2 



parameter/constraint 


TIME 


PRESET 


ORDER 


EXCLUDE 


tMax 


0.70 


0.50 


- 


- 


lapse 


1 


20 


1 


1 


relaxation 


0.50 


1.00 


- 


- 


tTrain 


0.05 


0.10 


0.05 


0.05 


tRel 


1 


1 


- 


- 


tEnd 


- 


■ 


- 


- 


tTarget 


0.50 


0.50 


1.00 


0.98 



Table 4. Table comparing algorithms CGA, CGAA1, and CGAA2 for different dura- 
tions of the congress 





EH 


UJ3 






EXCLUDE 


FITNESS 


CGA 


D 








7.33 


0.726058 


■l 


10.00 


102.70 




5.80 


0.654762 


D 








6.40 


0.618476 


CGAA1 


D 








2.73 


0.736826 


D 








5.16 




JL 


12.36 


86.40 




5.73 


0.647106 


CCA A? 


D 
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a point for comparison against the CGA, the best individual in the adaptation 
algorithms for a particular run was evaluated using the static weights. 

Table 5 shows results for a GA with static penalties, compared against the 
adpating models such as the CGAA1 and CGAA2 for 50 randomly generated 
problems (30 runs for each). It is observed that the algorithm CGAA1 behaves 
better in general than the static model, and algorithm CGAA2 behaves slightly 
better than both. The degree of violation is shown for each kind of constraint 
(minimizing) and in the right column the average fitness (maximizing). 
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Table 5. Table comparing performance of algorithms CGA, CGAA1, and CGAA2 for 
50 randomly generated problems 
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Fig. 2. Behavior of error and a for TIME in CGAAl(a,b) and CGAA(c,d) 



Figure 2 present the belravoir of the error along with parameter a for con- 
straint TIME for both algorithms CGAA1 and CGAA2, respectively. It is ob- 
served that when ( tT — tTarget ) is negative, the algorithm increases the penalty 
so that the number of feasibles also increases, and when the actual number of 
feasibles is greater than tTarget is greater than 0, it tries to pull it back. IN 
CGAA1, when a equals 1 around generation 150, it means that adaptation has 
been stopped. Adaptation is not halted in CGAA2. 

4 Conclusions and Future Work 



It has been demonstrated through an empirical study that adapting penalty 
weights in a GA for a constrained problem with multiple types of constraints 
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performs better than a GA with static penalty weights. Despite this fact, adapt- 
ing weights requires to set and tune various parameters. Future work is suggested 
to investigate more about this trade-off. Several lessons were learned with this 
experimentation that could help to tackle similar problems in the future. For in- 
stance, the parameter tRel should not be tuned. For parameter tTarget , it was 
found that for equality constraints should be 0.50, and for inequality this value 
is around 0.90. Parameter tMax varies from around tTarget to tTarget+ 0.20. 
This work also proposed a set of steps to introduce adaptability; it would be 
interesting to prove if they would help when facing other similar problems. 
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Abstract. In this work we address a capacitated hub problem arising 
from a Telecommunications application. In this problem we must choose 
the routes and the hubs to use in order to send a set of commodities from 
sources to destinations in a given capacitated network with a minimum 
cost. The capacities and costs of the arcs and hubs are given, and the 
graph connecting the hubs is not assumed to be complete. We present 
a mixed integer linear programming formulation and describe three dif- 
ferent decomposition techniques to get better performances than simply 
using a direct general solver on the model. These approaches can be 
applied to deal with more general problems in Network Design. 



1 Introduction 

This work concerns the following optimization problem arising in the design 
of a Telecommunications system. Let us consider a set I of computers ( source 
terminals) that must send information to a set J of computers ( destination 
terminals) through a network. The network consists of cables, possibly going 
through some computers ( routers ) of another set H . The whole set of computers 
is called node set and it is represented by V := I U J U H , and the set of cables is 
called arc set and it is represented by A. Hence, we have a directed graph G = 
(V, A). A given computer can be at the same time source terminal, destination 
terminal and router, thus subsets /, J and H are not necessarily disjoints. Based 
on the application motivating this work, we will assume in this paper that the 
three computer sets are disjoint. When a cable allows communication in both 
direction it is considered as two opposite arcs in A. We will assume there are 
not arcs in A from a destination terminal to a source one, i.e., there are no arcs 
from nodes in J to nodes in I. 

Associated to each arc a £ A there is a capacity q ai representing the maximum 
amount of information units that can go through it, and a value c a representing 
the cost of sending a unit of information through a and called routing cost. 
Equally, associated to each router h £ H there is a capacity qh , and a value Ch 
representing the fix cost of using router h and called maintenance cost. Finally, 
for each pair (i, j) £ I x J of terminals we are given an amount dij of information 
to be sent from i to j. Since most of these values can be zero, the dij information 
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units going from i to j will be termed commodity if dij > 0, and for simplicity 
they will be referred with one index k £ K := {(i. j) £ I x J : d tl > 0}. Also for 
brevity of notation, we will write d instead of d-ij if k = ( i,j ). 

The optimization problem, called Capacitated Hub Problem (CHP), consists 
in deciding the way the required flow of information between terminals must 
traverse the capacitated network in order to minimize the sum of the routing 
and maintenance costs. 

A particular case of this problem, referred as Capacitated Multiple Allocation 
Hub Location Problem, has been recently considered in Ebery, Krislmamoorthy, 
Ernst and Boland [6], motivated by a postal delivery application. In this par- 
ticular combinatorial problem, where the name “router” is replaced by “hub” 
and “terminal” by “client”, the capacity requirements only concern the hubs 
and apply only to mail letters coming directly from a client and going directly 
to a client. Moreover, the maintenance cost of a hub is paid when it receives 
or deliveries letters to a client, and not paid if the letter is moved from one 
hub to another hub. Those assumptions allow to work with a complete graph 
between hubs and to reduce the potential paths for a letter (as a letter will 
never go through more than two hubs) . Paper [6] presents mathematical models 
exploiting this advantages by using variables based on 3 and 4 indices. Another 
problem closely related to CHP has been addressed in Holmberg and Yuan [10]. 
Uncapacitated versions of the CHP have been extensively studied in the litera- 
ture (see, e.g., Campbell [4], Ernst and Krislmamoorthy [7], Mayer and Wagner 
[12], Skorin-Kapov, Skorin-Kapov and O’Kelly [15]). The CHP is also related 
to network design (see, e.g., Baralrona [2], O’Kelly, Bryan, Skorin-Kapov and 
Skorin-Kapov [14]). 

In our application from Telecommunications (see Alminana, Escudero, Monge 
and Sanclrez-Soriano [1]) the above mentioned assumptions do not hold since we 
have maintenance costs and capacity constraints also for routers not connected 
with terminal nodes. We do not know any previous study on the CHP as it 
is here defined, and we present a mathematical formulation based on 2-index 
variables. We use this model to develop an algorithm for CHP. Some preliminary 
computational results are also presented. 

2 ILP Model 

To simplify the notation, it is convenient to extend the arc set of G with the 
dummy arcs from destinations to origins, i.e., 

A := Au"k, 

where ‘K is the set of arcs from j £ J to i £ I if dij > 0. For each subset S C V, 
we will denote by 

S + (S) := {(u,v) £ A : u £ S,v £ V \ S} 

and 

6~(S) := {(u, v) £ A : u £ V \ S,v £ S'}. 
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Let us consider a decision variable yh associated to each h G H that takes 
value 1 when the router is used, and a continuous variable x a associated to each 
a € A representing the amount of communication traversing cable a. 

To present a mathematical model we also make use of an additional set of 
continuous variables, [/£’ : a £ A], for each commodity k £ K. If k = (i. j) 
is the commodity going from i to j, then variable / f , represents the amount 
of communication from source i to destination j traversing cable a. Then the 
mathematical model is: 



min ^2 c a x a + ^ c h y h (1) 

a£A heH 

subject to: 

x a < q a for all a £ A 
Y, x a < q h y h for all h £ H 

ag6+({7i}) 

yh £ {0, 1} for all h £ H , 

and there exist such that 

x a = y fa for all a € A (5) 

keK 

and for each commodity k = ( i,j ): 



E /J = 




for all v £ V 


(6) 


ae<5+({v}) 


ae6-(M) 








f k a> 0 


for all a £ A 


(7) 




II 




(8) 




f(u,v) = 0 


for all (u, v) £ *K \ {(j, i)}. 


(9) 



Constraints (6)-(9) require the existence of a flow circulation in G moving 
exactly dk units of commodity k. Equalities (5) gather the individual flows into 
a common one that must satisfy the arc-capacity requirements in (2) and the 
node-capacity requirements in (3). Finally, (4) impose that a 0-1 decision on 
the routers must be taken. Then, vector x represents the total communication 
units circulating through the network, and is a vector of continuous variables, 
while vector y is a 0-1 vector to decide the routers to be installed. The main 
difference with a multi-commodity flow formulation problem is the fact that x 
is not required to be a vector of integer variables. Still, the problem remains 
MV - hard due to the integrability requirement on the y vector. 

Magnanti and Wong [11] observed that it is computationally useful to add 
the upper bound inequalities /* < qh'IJh for all k € K, for all h € H and for all 
a € 6 + ({h}) U 6~ ({h}) , even if they are redundant when inequalities (3) are also 
present in the model. Surprisingly, this was also confirmed by our experiments. 



(2) 

(3) 

(4) 
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In addition, we also got better computational results by extending the model 
with the equations 

x a = x a for all v € V. 

agi5+({ti}) a6<5“({«}) 

At first glance, a simple way to solve the CHP is to provide the Mixed Integer 
Programming model to a general purpose solver, like Cplex 8. 1 . Nevertheless, 
this model has the disadvantage of having a large number of continuous variables 
and constraints, and therefore it is very unlikely to help when solving medium- 
size instances. The following sections will describe algorithms that decompose 
this large model into smaller ones to avoid this disadvantage. 



3 A First Approach: Dantzig- Wolfe Decomposition 

Dantzig and Wolfe [5] propose a decomposition technique to solve large linear 
programming problems, and this principle can be used to solve model (6)-(9). 
The basic idea from (5) is that a solution a; is a sum of flows, each one associated 
to a commodity k. A decomposition approach consist in solving iteratively the 
continuous relaxation of a master problem (1)— (5), where for each k € K the 
flow variables [/* : a £ A] are replaced by a convex combination of (extreme) 
solutions of a subproblem defined by (6)-(9). 

More in detail, the procedure is an iterative procedure. At each iteration we 
solve a continuous relaxation of a master problem defined by an index subset Fk 
of flows g l for each commodity k £ K, and then we consider a subproblem for 
each commodity k to check if there is a flow with negative reduced cost (defined 
with the dual variables from the master). The master problem is defined by the 
objective function (1) and the constraints (2)-(4), plus the additional constraint 

x a = H Xl 9a 

kGK l£F k 

A/ = 1 for all k £ K 

l£F k 

A/ > 0 for all l £ Fk and for all k £ K, 
which in a sense replaces (5). 

For each commodity, the subproblem is simply a min-cost flow problem (6)- 
(9). If a flow g l with negative cost is found then it is included in Fk and an- 
other iteration is done. When all flows have non-negative reduced costs for all 
commodities then the continuous relaxation of the master problem has been 
optimally solved, but potentially a branch-and-bound process should start to 
guarantee integrality on the y variables. At each node of the branch-and-bound 
procedure the same procedure performs using the candidate sets Fk produced 
at the previous nodes. 

A minor modification in the model would allow to replace the min-cost flow 
computations by shortest path problems. To do that we must simply observe 
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that the min-cost flow problem (6)-(9) has a fix capacity requirement on the arc 
(j, i) if k = (i,j) G K, while all other arcs have no capacity upper bounds. Then 
the extreme solutions of this polyhedron are clearly defined by paths from i to j. 
Arc capacities exist in CHP, but these restrictions are ensured by constrains (2) 
in the master problem. Then the flows whose existence should be checked are 
uncapacitated flows, which is equivalent to check, for each commodity k = ( i,j ), 
if j is reachable from i in the graph defined by the solution x* of the master 
problem. Moreover, when the shortest-patlr problem replaces the flows, then the 
modification consists in adding in the master problem the constraint Xji > dk 
for each commodity k = ( i,j ). 

The full model is then a column-generation approach at each node of a 
branclr-and-bound scheme. 



4 A Second Approach: Benders’ Decomposition 



Another alternative to skip the large number of flow variables in model (1)- 
(9) is to use Benders’ Decomposition (see e.g. Benders [3] and Nemhauser and 
Wolsey [13]), which is the dual variant of Dantzig- Wolfe Decomposition in Linear 
Programming. Indeed, a vector [yh ■ h G H] and a vector [x a '■ a G A] satisfying 
(2)-(4) define a feasible solution for the CHP, if and only if, for each k G K 
there exists a vector [f£ : a G A] in the polyhedron defined by (5)-(9). Farkas’ 
Lemma gives a way of imposing the existence of such solutions through linear 
inequalities as follows. Let us consider a dual variable aj associated to each 
equation in (6), a dual variable (i^ u ^ associated to each equation in (8)— (9), and 
a dual variable J( u ,v) associated to each equation in (5). Then, the polyhedron 
(5)-(9) is non-empty if and only if all (extreme) rays of the polyhedron cone 

a u ~ a v + 7 {u,v) > 0 f° r a ll {u,v) G A\*K ,k G I< 
a u - a v + 0(u,v) + 7 (u,v) > 0 for all (u,v) G*K,k G K 



satisfy 

H d kP(j,i) + Xa ^ a - °' ( 10 ) 

k=(i,j)£K a£A 

Some extreme rays (a, /3, 7) of this cone are associated to a subset S C V 
and are defined as 



/+ 1 


ifveS 


\o 


otherwise, 


-1 


if a G S + (S) 


0 


otherwise, 


+1 


if a GS~ (S) 


0 


otherwise. 




By replacing these values in (10) we obtain 



Xa 

ae«+(S) 






iems jeJ\s 



( 11 ) 
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for all S C V. Unfortunately these extreme rays do not generate all the cone, 
and therefore for the moment we do not have an alternative model for the CHP. 

Constraints (11) are known in the literature related to the Multi- commodity 
flow formulation (see, e.g., Chapter 6 in [9]) and are termed Cut Inequalities [2] 
or Bipartition Inequalities [8] . Unfortunately, no polynomial algorithm is known 
for separating these inequalities, and the best approach consists in solving a 
max-cut problem. Still, it is interesting to notice that they are only a subfamily 
of (10), which can be separated by solving a linear programming problem. 

The overall approach is then another iterative procedure where a master 
problem is solved at each iteration. This master problem is defined by considering 
the objective function (1) and the constraints (2)-(4), plus a set of inequalities 
generated by the subproblem. Given a solution x* of the master problem, at 
each iteration the subproblem checks if the objective function 

min d kP(j,i) + x *^ a 

k=(i,j)£K aEA 

over the above cone is unbounded or not. If a subproblem is unbounded, then 
the found direction (/3q q, 7a) defines a violated inequality (10) to be added to 
the new master problem. 

A relevant difference when comparing this decomposition approach with the 
one presented in the previous section is that constraint (5) is part of the master 
problem when using Dantzig- Wolfe’s decomposition, while it goes to the sub- 
problem when using Benders’ decomposition. 

An observation very useful in practice is based on the fact that for each 
commodity k = ( i,j ) only the variables /ut ^ are necessary in (10), and therefore 
it is not necessary to overload the subproblem by considering all the variables 

v) f° r ( u > v ) * n the cone. This consideration substantially reduces the size 

of the subproblem solved at each iteration. 

5 A Third Approach: Double Decomposition 

The approach presented in the previous section has the inconvenience of re- 
quiring a large subproblem to be solved at each iteration. A way to avoid this 
disadvantage consists in applying, once again, a decomposition approach and 
to solve the subproblem by another iterative procedure similar to the one used 
with the master problem. More precisely, let us consider the master problem as 
in the above Benders’ decomposition approach. As before, during the iterative 
procedure a subproblem will generate violated constraints when it is unbounded, 
until the optimal solution of the subproblem is zero (i.e., it is not unbounded). 

Instead of solving the subproblem by using a linear programming solver, 
we write its dual program and observe that it is a linear program with a very 
reasonable number of variables and with an exponential number of constraints. 
The advantage is that not all the constraints are necessary simultaneously, and 
the most violated one can be found by solving shortest path problems. The 
details are as follows. Given a solution x* of the master problem, the associated 
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subproblem consists of checking the feasibility of the linear system of inequalities 
(5)-(9) for x = x*. By Farkas’ lemma, this is equivalent to check the potential 
unboundness of the objective function 

min ^2 d kP k + X] x *°^ a 
keK aeA 

over the directions (/3, 7) of the cone described by the collection of inequalities: 

z' a 7 0 + f3 k > 0 for all k G K , 

aeA 

being z' the characteristic vector of a path going from i to j for each commodity 
k = (i,j) G K , and by the trivial constraints: 

7 a > 0 for all a € A 
f3 k < 0 for all k G K. 

We are reducing the feasible region of this problem by considering signs on 
the variables, based on the properties that one can replace — without loss of 
generality — all the equalities in (6) by inequalities of type “<”, and all the 
equalities in (8) by inequalities of type 

Observe that one can expect the variables z' to define flows from i to j instead 
of paths, but only the extreme flows are necessary to generate all of them and 
the subproblem has no capacities on the arcs, as mentioned at the end of Section 
3. Moreover, once again, for each commodity k = ( i,j ) G K one could expect 
to have variables fi k u , for all (u,v), but only the one associated to arc (j, i) is 

necessary if the other arcs (u,v) G 5? are removed when generating the paths. 

Finally, even in this dual version, the linear program is difficult to solve if all 
the constraints are given to a linear programming solver. The advantage of this 
approach, whereas, is that not all the constraints are necessary simultaneously 
and the most violated one (if any) by a given solution (/?*, 7*) can be generated 
by solving shortest path problems. Observe that for each commodity k = ( i,j ), 
z' is a path going from i to j. More precisely, for each commodity k G K , let us 
compute the shortest path on the graph G k = ({i, j}U H, A\*K) where each arc 
a is associated to a cost 7*. Let z' be the characteristic vector defined by setting 
z' a to 1 if arc a is in the path used by the commodity k, and to 0 in another case. 
If YhaeA ^*a z 'a + P* < 0, then the vector z' generates a violated constraint to be 
added to the subproblem, that must be reoptimized. When z' does not induce a 
violated inequality for any of the commodities, then the subproblem was solved, 
and the last unbounded direction (/?*, 7*) defines a violated constraint for the 
master problem. The iterative procedure on the master problem stops when the 
subproblem finds the zero vector as the optimal solution. 

Once again, this mechanism allows to solve the continuous relaxation of the 
master problem, hence a branclr-and-bound procedure must start if a yh variable 
has a fractional value. At each node of the branclr-and-bound procedure the same 
iterative procedure can be applied. 
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During the process three programs are solved: a master problem by using a 
linear programming solver, a subproblem by a linear programming solver, and a 
shortest path problem by an specialized algorithm. An advantage of the process 
is that during the different iterations the region of the three solved programs 
remains the same. Indeed, all the cuts generated for the master problem and 
for the subproblems can be used in the next resolutions. Also the shortest path 
problems can be always computed on the same graphs. The difference between 
iterations consists only in the different coefficients in the objective function. 



6 Computational Experiments 

The decomposition techniques described in Sections 3 and 4 have similar per- 
formances, while the one in Section 5 involves a more elaborated way of solving 
the subproblem. This section shows the results of experiments conducted on 
solving some random instances with our implementations of the decomposition 
approaches in Sections 4 and 5. These experiments were carried out on a Pentium 
IV 1500 Mhz. using the branch-and-cut framework of CPLEX 8.1. 

The instance generator is next described. Given the sets I, J, H, the arc 
density of the graph induced by H was fixed to a parameter taking values in 
{30%, 50%, 85%}, which gave instances with low, medium and high density. The 
percentage of arcs from (I x H) U ( H x J) was fixed to 80%. These settings were 
chosen in order to likely produce instances with feasible solutions. The amount 
of information d l3 from i € I to j € J was generated in [1,5]. The capacities qh 
and q a were generated in [1, 1 1 x J\ ■ 2, 5]. The costs Ch and c a were generated in 
[1,50]. We considered (| I\, | J\, |J?|) in {(2, 2, 4), (3, 3, 5), (5, 5, 5), (5, 5, 10)}, and 
for each triplet we generated five random instances with the above features. 

Table 1 shows average results of applying the two decomposition approaches 
on feasible instances. Column headings display: 

— The density (d) of the generated graphs: low (1), medium (m) or high (h). 

— The percentage ratio LB/opt, where LB is the objective value of the LP- 
relaxation computed at the root node of the branch-and-cut tree, and opt is 
the optimal CHP solution value. 

— The time in seconds spent in the root node (r-time), number of nodes in 
the branch-and-cut tree (nodes), total computing time (time) and separated 
inequalities (cuts) used by the Benders’ decomposition algorithm. 

— The time in seconds spent in the root node (r-time’), number of nodes in 
the branch-and-cut tree (nodes’), total computing time (time’), inequalities 
introduced in the master problem (cuts’), and inequalities introduced in the 
subproblems (cuts”) while applying the Double Decomposition algorithm. 

The table clearly shows that the Double Decomposition algorithm outper- 
forms the classical Benders’ approach. Indeed, in our experiments it was about 
ten times faster on the largest instances. This is explained by the fact that the 
Double Decomposition exploits the combinatorial structure of the subproblem, 
while Benders’ Decomposition solves it as a standard linear program. 
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In both approaches, the number of inequalities reported in the column cuts 
refers to the cuts added to the master problem, which is slightly smaller than 
the number of subproblems solved. In the Benders’ approach the subproblems 
are solved as black boxes by an LP solver, while in the Double Decomposi- 
tion each subproblem requires to solve a sequence of shortest path problems. 
The number of path problems is slightly larger than the number of cuts added 
to the subproblem, which tends to be a very small number. Column cuts” in 
Table 1 shows the total number of inequalities added to all the subproblems 
while, on average, the number of shortest path inequalities added to each sub- 
problem varies between 4 for the smallest instances and 12 for the largest ones. 
This explains the better performance of the new technique. 



Table 1 . Results on random instances 



Benders’ Decomp. Double Decomp. 





LB /opt 


r-time 


nodes 


time 


cuts 


r-time’ 


nodes’ 


time’ 


cuts’ 


cuts” 




1 


98,0 


0,1 


1,6 


0,1 


1,4 


0,1 


1,6 


0,1 


1,4 


11,4 


(2,2,4) 


m 


99,8 


0,1 


1,2 


0,1 


1,2 


0,1 


1,2 


0,1 


2,2 


13,4 




h 


99,6 


0,1 


1,2 


0,1 


1,4 


0,2 


1,2 


0,2 


1,4 


14,8 




1 


99,0 


0,2 


2,5 


0,2 


4,8 


0,3 


2,8 


0,3 


5,5 


35,3 


(3,3,5) 


m 


99,8 


0,3 


3,4 


0,3 


7,4 


0,3 


2,8 


0,3 


6,0 


39,6 




h 


96,9 


0,3 


3,4 


0,3 


7,6 


0,5 


3,4 


0,5 


7,4 


63,0 




1 


99,6 


18,1 


1,8 


18,4 


25,8 


1,6 


1,8 


1,6 


24,8 


92,0 


(5,5,5) 


m 


99,6 


18,8 


1,5 


18,8 


25,8 


1,9 


1,5 


2,0 


24,8 


114,8 




h 


99,6 


21,1 


2,3 


21,1 


23,8 


2,5 


3,2 


2,5 


23,0 


155,8 




1 


94,9 


123,4 


12,8 


148,6 


65,2 


10,6 


15,6 


12,2 


69,0 


394,0 


(5,5,10) 


m 


95,1 


176,2 


13,4 


219,1 


72,2 


16,1 


13,4 


18,4 


64,4 


611,8 




h 


96,3 


225,2 


12,0 


297,7 


66,8 


29,0 


11,8 


32,3 


62,0 


1144,6 



7 Conclusions 

We have introduced the combinatorial optimization problem of deciding the 
hubs to be opened in a telecommunications network to send given demands of 
information from source terminals to destination terminals at minimum cost. 
We are not assuming that all the hubs are connected by an arcs, as happens 
in works related with hub location in the literature. Capacities on the arcs and 
hubs are considered, and therefore the problem is referred as Capacitated Hub 
Problem. 

This problem is very closely related to the well-known Multi-commodity Flow 
Problem in Network Design, but differs in the fact that our objective function is 
not piece-wise in the total flow traversing an arc. The complexity in our problem 
is due to the cost for opening hubs, thus managing a zero-one variable for each 
node. 

This paper presents a Mixed Linear Programming model. The large num- 
bers of continuous variables and constraints create difficulties when applying a 
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general purpose solver like Cplex 8.1, and therefore we have presented different 
approaches to solve it by decomposition techniques. The first two approaches are 
based on single decomposition techniques, while the third one applies a two-level 
nested decomposition scheme. 
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Abstract. With the aim of supporting the process of adapting railway 
infrastructure to present and future traffic needs, we have developed a 
method to build train timetables efficiently. In this work, we describe the 
problem in terms of constraints derived from railway infrastructure, user 
requirements and traffic constraints, and we propose a method to solve 
it efficiently. This method carries out the search by assigning values to 
variables in a given order and verifying the satisfaction of constraints 
where these are involved. When a constraint is not satisfied, a guided 
backtracking is done. The technique reduces the search space allowing 
us to solve real and complex problems efficiently. 



1 Introduction 

The train scheduling problem is basically an optimization problem which is com- 
putationally difficult to solve. Several models and methods have been analyzed 
to solve it [1], [2]. In our method, we consider a heterogeneous railway network 
and we focus on adding new trains to a railway network that is already occupied 
by trains in circulation. The models proposed above are not efficient for the type 
of problem that we are considering. The majority of the papers published in the 
area of periodic timetabling in the last decade are based on the Periodic Event 
Scheduling Problem (PESP) introduced by Serafini and Ukovich [8] . Specifically, 
an efficient model that uses the PESP and the concept of symmetry is proposed 
in [6]. However, we cannot use the concept of symmetry because: (i) we allow 
different types of trains and this does not guarantees the necessary symmetry 
to be able to use these models; and (ii) the use of the infrastructure may not be 
symmetrical. There are related works to railway problems such as the following: 
allocation of n trains in a station minimizing the number of used tracks and al- 
lowing the departure of trains in the correct order [4] , allocation of new stations 
through the railway network to increase the number of users [7], etc. There are 
tools to solve certain kinds of problems such as the Rescheduling tool [3] or the 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 164—173, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Method to Schedule New Trains on a Heavily Loaded Railway Network 



165 



TUFF scheduler [5] . The Rescheduling tool allows the user to modify a timetable 
when trains in a section of track cannot run according to the infrastructure, en- 
suring that scheduling rules are not violated. The TUFF scheduler describes a 
constraint model and solver for scheduling trains on a network of single tracks 
used in both directions. However, our problem differs from the type of problems 
dealt with by the methods mentioned above. In the following section, we explain 
in more detail the type of problem that we have dealt with. 

2 Problem Specification 

We propose to add new trains on a heterogeneous heavily loaded railway net- 
work, minimizing the traversal time of each new train. The timetables for the 
new trains are obtained in a search space that is limited by traffic constraints, 
user requirements, railway infrastructure and network occupation. The problem 
specification does not require that all considered trains visit the same sequence 
of locations. There may be many types of trains, which implies different: ve- 
locities, safety margins, commercial stops and journeys. Our method takes into 
account the following scenario to generate the timetables corresponding to the 
new trains: 

1. two sets of ordered locations Ljj = {Ik, h+i, •••, h+m} and 

Ljj {lh, lh+i , such that {Hi,j\lj £ Ld A £ Ljj A Ij £ Lfj A 

lj+i £ Ljj A Ij = |_i A (_i = lj}. A pair of adjacent locations can be joined 

by a single or double track section. Ljj and Ljj correspond to the journey of 
trains going in up and down directions, respectively. 

2. a set of trains for each direction. Tjj = {to,t 2 , is the set of trains that 

visit the locations in Ljj in the same order given by this sequence and going 
in the down direction. Tjj = {U, t 3 , t u } is the set of trains that visit the 
locations in Ljj in the same order given by this sequence and going in the 
up direction. The subscript i in the variable f, indicates the departure order 
among the new trains going in the same direction. 

3. a journey for each set of trains (Tjj and Tjj) specifies the traversal time for 

each section of track in Ljj and in Ljj (R^ , and the minimum stop time 

(Sj) for commercial purposes in each lj. 

Considering t y dep_l x and t y arriv_l a , as the departure and arrival times of train 
t y from/at location \ x , the problem consists of finding the running map that 
minimizes the average traversal time, satisfying all the following constraints: 

— Initial Departure Time. The first train must leave from the first station of 
its journey within a given time interval ([ minD,maXD\ for trains going in 
the down direction and [minjj ,maxjj] for trains going in the up direction). 

minjj < tjjdepJk < maxo A minjj < tjdepJh < maxjj . 



(1) 
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Fig. 1 . Reception and Expedition time constraint 



— Frequency of Departure. It specifies the period (Fjj/Fb) between departures 
of two consecutive trains in each direction from the same location, 

{Vf^, lj \ti G { If) {^d}} A lj G { L {l k +m } } } 5 ^ 7 + 2 depJj = tidep-lj F Fb . 

( 2 ) 

{\/tiJj\U G {Tu-{t u }}Mj G {L-{l h+n }}},t i+2 depJj = LdepJjFFjj . (3) 

— Minimum Stops. A train must stay in a location lj at least Sj time units, 

{Vfj, lj\ti G {T D U Tjj} A lj G {L D U Ljj — {l k , l h , lk+m, lh+n } } } ? 

tidepJj > tiarrivdj F Sj . ( 4 ) 

— Exclusiveness. A single track section must be occupied by only one train at 
the same time. 



{\/tj,ti, l Xl ly/tj G Tjj A ti G Tjj A l x G Lb — {lk+m} A 
ly G Ljj {lh} A l x = ly + 1 A l x + 1 = ly} i 

tjdepJy > tjarrivJy V tjdepJ x > tjarrivlx . (5) 

— Reception Time. At least time units are required at location l x between 
the arrival times of two trains going in opposite directions (Figure l.a). 

{Vtj,ti, l x /tj G T D /\ti G Tjj A l x G {L D - {l k } n Ljj — {/ft}}}, 

tjarrivJ x > tjarrivJx F R x V tiarrivJ x > tjarrivJ x F R x . (6) 

— Expedition Time. At least E x time units are required at location l x between 
the arrival and departure times of two trains going in opposite directions 
(Figure l.b). 

{}dtj , tj, l x /tj G Tb A tj G Tjj A l x G {Lb {lk+m} fl Ljj {1} 14-n} } } • 

tjdepJ x > tiarrivJ x F E x V tidepJ x > tjarrivJ x F E x . (7) 

— Precedence Constraint. Each train employs a given time interval {R x ^> x + 1 ) 
to traverse each section of track ( l x — > ^ x _ ( _ 1 )in each direction. 

{Vt j , tj , l X j ly /tj G Tb A tj G Tjj A l x G {Lb {lk+m } } A ly G {Ljj {lh+n} } } ? 

tiarrivJ x +i = LdepJ x F R x ^ x + i A tjarrivJ y + 1 = tjdepJy F R y ^ y +\. (8) 
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— Capacity of Each Station. The number of trains that may stay simultaneously 
in a station depends on the number of available tracks in it. 

— Closure Times. Traffic operations and/or passing of trains are not allowed 
at the closing times of a station. 

3 Sequential Algorithm 

In this section, we explain the algorithm that is used to solve the specified prob- 
lem in Section 2. We have named it ’’Sequential” because of the way that it 
generates the timetable for each new train (Figure 2). For each iteration, the 
Sequential algorithm constitutes a subset of the whole search space where it 
searches the values for the problem variables that satisfy all the problem con- 
straints (Section 2). The assignment of valid values to the problem variables 
generates a timetable (if there is a feasible solution in the subset) for each new 
train (line 08-14 in Figure 2). The elements of the reduced search space depend 
on the following values: 

1. Initial departure time for the first train going in the down direction 
(init_clep_down in Figure 2). A value belonging to the time interval 
[min max d] is chosen randomly at each iteration (Constraint 1 in Sec- 
tion 2). This time interval is part of the input parameters / (one of the 
input parameters of the Sequential Algorithm in Figure 2) given by the final 
user. 

2. Initial departure time for the first train going in the up direction (init dep up 
in Figure 2). A value belonging to the time interval [minjj ,maxu\ is chosen 
randomly at each iteration (Constraint 1 in Section 2). This time interval is 
part of the input parameters / (one of the input parameters of the Sequential 
Algorithm in Figure 2) given by the final user. 



01 procedure Sequential_Algorithm(I,C) 

02 begin 

03 while (Enough_Ti me () ) 

04 S=Generate_Set_Ref_Station( ) 

05 ref_st=Get_Ref_Station(S) 

06 init_dep_down=Get_Init_Dep_Time (min r ,max .) 

07 init_dep_up=Get_Init_Dep_Time (min v ,max t; ) 

08 schedl=Get_Partial_Sched ( init_dep_down, l k , ref_st, T.,) 

09 sched2=Get_Partial_Sched(init_dep_up, l h , ref_st, T 0 ) 

10 init dep down=t„arriv 1 . ,+S , . 

11 init dep up=t.arriv 1 , +S , ' 

12 sched3=Get_Partial_Sched (init_dep_down, ref_st, l < .„,T n ) 

13 sched4=Get_Partial_Sched (init_dep_up, ref_st, l htI1 ,T„) 

14 new_sched=s ched 1+ sched2 + s ched3 + sched4 

15 if ( Is_Better (new_sched, best_sched) ) 

16 best_sched=new_sched 

17 end while 

18 Show (best_sched) 

19 end 



Fig. 2. Sequential Algorithm 
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3 . Reference Station (ref_st in Figure 2 ). When the assigned value to a problem 
variable that represents the departure time of a train violates Constraint 
5 defined in Section 2 (to avoid crossings between trains going in opposite 
directions) , the process must decide which of the two trains will have to wait 
for the section track release. The decision taken by the process will state a 
priority order between the trains competing for the same resource, the single 
track section. 

When the two trains competing for a single track section are a train in 
circulation and a new train that should be added to the railway network, 
the priority order will always be the same. The new train will have to wait 
until the train in circulation releases the single track section. 

When the two trains competing for a single track section are both new 
trains, the priority order is decided according to the position of the single 
track section with respect to one station, which we name the reference sta- 
tion. The reference station divides the journey of each new train into two 
parts: the first part goes from the initial station of the journey to the refer- 
ence station ; the second part goes from the reference station to the last sta- 
tion of the journey. Each train will have priority on the single track sections 
that belong to the first part of its journey. In Figure 3 , S2 is the reference 
station and it divides the journey of each new train into two parts. For the 
trains going in the down direction ( t 0 , £ 2 and £4), the first part of their jour- 
ney is composed of the track sections ( 5 'o-S'i) and (S1-S2); the second part 
of their journey is composed of ( S2-S3 ) and ( S3-S4 ). For the trains going in 
the up direction (£1, £3 and £5), the first part of their journey is composed of 
(S4-S3) and ( S3-S2 ); the second part of their journey is composed of (S2-S1) 
and (S1-S0). In Figure 3 , the possible crossing between the trains £ 0 and £1 
(in case that t 0 left from S2 as soon as possible) is highlighted by a circle. 
The single track section (S2-S3) belongs to the first part of the journey of 
£1, and therefore this train has greater priority than to on this track section. 
Then, £ 0 will have to wait until the single track section ( S2-S3 ) is released by 
the train t\. The dotted lines represent the position of the train £ 0 if it had 
left from the station S2 as soon as possible. The continuous lines represent 




Fig. 3. Priority Order between new trains defined by a Reference Station 
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the real departure time of the train to after the single track section had been 
released by t\. The same reasoning is applied to the other cases in Figure 3 
and are designated by circles. 

The sequential algorithm iterates obtaining new solutions until the time given 
by the user has been completely used up or until the user interrupts the execu- 
tion (line 03 in Figure 2). For each iteration, the sequential algorithm compares 
the obtained scheduling with the best scheduling obtained to that point. The 
scheduling that produces the least average traversal time for each new train is the 
best scheduling. The function Is -Better {new -timetable, best -timetable) returns 
TRUE if the new scheduling ( newsched in Figure 2, obtained in the current 
iteration) is better than the best scheduling ( bestsched in Figure 2, obtained 
until that time). Finally, the sequential algorithm returns the best timetable that 
has been obtained during the running time. 

We have decided not use general solvers such as LINGO or CPLEX because 
we needed integer binary variables to model given constraints in a mathematical 
model and the resulting complexity was too high for real world problems. For 
this reason, we implement a custom solver, which uses the knowledge about this 
type of problem to solve it in a more efficient way. 

3.1 Description of an Iteration of the Sequential Algorithm 

At each iteration, the sequential algorithm generates a complete scheduling for 
each group of trains (To and Tjj) in two steps. In Figure 2, schedl and sched2 
correspond to the scheduling generated for the first part of the journey of each 
new train going in the down and the up direction, respectively. In Figure 2, sched3 
and sched^ correspond to the scheduling generated for the second part of the 
journey of each new train going in the down and the up direction, respectively. 
The complete scheduling is generated in this way in order to establish greater pri- 
ority for each new train on the first part of its journey. In the case of there being 
the possibility of a crossing in a track section, the greatest priority will be given 
to the new train whose timetable had been assigned first on that track section. 
The first and second part of a journey are established by the chosen reference 
station at the current iteration. Figure 4 shows how the scheduling for one part 
of the whole journey is generated. V erify-Constraints(st, nextst, dep-time, tj) 
verifies that all problem constraints are satisfied by the train t; in the track 
section limited by the stations st and nextst. This function returns the time 
that must be added to the departure time of tj in order to satisfy the violated 
constraint. If the function returns 0, then no constraint has been violated. Con- 
sider the set L' — {l x , l x +i, ..., l x +p} as the ordered set of locations visited by 
the train tj, from l x = st to l x + P = nextst. This function assigns values to the 
variables ti-depJj and ti-arrivdh such that x <= j < x+p and x < h <= x + p 
(Figure 5). Figure 5 shows an example of how the constraint that avoids a cross- 
ing between trains going in opposite directions is verified. Consider that train 
t ‘2 is the train tj (the train whose timetable is being created), and t' is the train 
whose timetable has been created first and cannot be modified. The initial value 
assigned as departure time from st=L„ to £2 causes a crossing of t! with t 2 (see 
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procedure Get_Partial_Sched ( init_dep, f irst_st, last_st, T) 

begin 

st=first_st 
dep_time=init_dep 
while (st != last_st) 

next_st=Get_Next_St ( st ) 
i=Get_First_Train (T) 
while ( tj ! =NULL) 

error=Verify_Constraints (st,next_st,dep_time, tp 
if (error>0) 

if (Is_Required_Frequency () ) 
i=Get_First_Train (T) 
tpiep^^t.dep—l.j+error 

end if 
else 

if (Is_Required_Frequency () ) 

dep_tirae=t dep_l t +F T 
else 

dep_time=t ! t ,arriv_l = .+S T 

end if 
i=i+2 
end if 
end while 
st=next_st 
end while 
end 



Fig. 4. Partial Scheduling 
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int function Verify_Constraints (st, next_st, dep_time, tp 

begin 

k=st 

m=next_st 

trav_time=Get_Traversal_Time (st, next_st) 
t j dep_l ; =dep_t ime 
t i arriv_l Ill =dep_time+trav_time 
error=Verif y_Crossing ( t _dep_l k , t arriv_l h ) 
if (error=0) 

error=Ver if y_Over taking (t i dep_l Jc , FarriV-ip 
if (error=0) 

error=Verif y_Availability_Tracks (t i arriv_l h ) 
if (error=0) 

error =Verif y_Closure_Time (t^rriv^^) 
if (error=0 ) 

Set_Time table (st,next_st) 

end if 
end if 
end if 

return error 

end 

Fig. 5. Constraints Verification and Timetable Assignment 



the picture on the left in Figure 6). The function Verify -Constraints computes 
the time that must be added to dep time in order to avoid this crossing, and 
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Fig. 6. A crossing detected in a section of track 



this value is returned by the function. The picture on the right in Figure 6 shows 
the necessary delay in the departure time of t 2 from 

The precedence constraint (Constraint 8 in Section 2) is used to propagate 
the values among the variables(frai>J7me in Figure 5 is the time spent to go 
from st to nextst). Minimum stops constraint (Constraint 4) is used to compute 
the departure time of a train from its arrival time at the same location (St 
in Figure 4). Get-Next-Station(st,T) returns the next station to st in the 
journey corresponding to trains belonging to T. Get-Fir stJTrain(T) returns 
the first train that starts the journey corresponding to trains belonging to T. 
Is-Required-Frequency(st) returns TRUE if all the trains must keep the same 
departure frequency in the station st. 



4 Computational Results 

The algorithm just described was implemented in C++ and tested on a set of 
instances from the National Network of Spanish Railways (RENFE), which has 
collaborated closely with the research group. We have used an Intel Pentium 4 
1,6 GHz processor to test the algorithm. Figure 7 shows the different average 
traversal times obtained by our algorithm as the time elapses for two different 
instances of the problem specified in Section 2. The picture on the left in Fig- 
ure 7 shows the different results for the problem of adding 10 new trains in each 
direction to the railway network occupied by 81 trains in circulation. The main 
journey of this network is Madrid-Zaragoza (65 locations), but there are different 
types of trains with different journeys, which are subsets of the main journey. 
The first new train must leave within the time interval [7:00:00,8:30:00] and the 
departure frequency between trains in the same direction must belong to the 
time interval [0:10:00,0:50:00]. The picture on the right in Figure 7 corresponds 
to the same problem but without taking into account trains in circulation on the 
railway network. We have not been able to obtain results of similar methods for 
comparison with our algorithm. Therefore, we have modeled our problem to solve 
it using LINGO and CPLEX. The models solved by these general solvers do not 
take into account constraints about the number of tracks and the closure time in 
each station. The instance tested with these solvers does not take into account 
trains in circulation because the problem becomes untractable for them. The se- 
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Fig. 7. Average traversal time in function of the running time 



quential algorithm takes into account all the problem constraints. It is important 
to note that LINGO and CPLEX receive the whole search space generated by 
each problem without any pre-processing to perform some reduction on it. Table 
1 has two columns for each solver, the column labeled ’’Journey Time”, and the 
column labeled ’’Running Time”. The ’’Journey Time” column shows the ob- 
jective function value depending on the elapsed time, and the ’’Running Time” 
column shows the elapsed time with respect to the start time of each solver. 
How the objective function value is improved as the elapsed time increases is 
shown for each solver. The results shown in Table 1 correspond to the following 
instance: number of new trains for each direction: 10, number of locations that 
are visited by each train: 42 (Vigo-A Coruna), frequency of departure: 2:00:00, 
initial departure time interval: [7:00:00,8:30:00]. 



5 Conclusions 

The problem variables are ordered implicitly in the sequential algorithm. The 
timetable for ti is assigned before the timetable for ti + 1 is assigned. The timetable 
at the location l x is established before the timetable at l x+ 1 is established. With 
this ordering, the required backtracking undoes the least number of instantiated 
variables. In each iteration, the variable values of the problem are computed from 
an initial departure time(todepJk = init-dep-down and tidepdh = init-dep-up) . 
When the initial departure time is modified, a different subset of the whole 
search space is obtained, and therefore a different scheduling can be generated. 
The reference station establishes the priority order between two trains competing 
for the same resource, a single track section. When the value for this parameter 
is modified the priority order between trains is modified as well. Therefore, a 



Table 1. Running time to obtain each timetable with different solvers 
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SEQUENTIAL 


running time 


journey time 


running time 
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running time 
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train that in a previous iteration had to wait for the release of a track section, 
could be the train with greater priority in another iteration. Then, its timetable 
could be different. Each iteration with a different combination of values for these 
parameters reduces the search space to a different subset. Each iteration is solved 
quickly because it only has to explore a reduced part of the whole search space. 

Our heuristic consists of obtaining several solutions, where each one is ob- 
tained from a different subset of the same search space. This is done instead of 
searching one single solution in a whole search space, which in real cases may 
be untractable. The sequential algorithm indicates an order among the problem 
variables. This order implicitly points out the priority of one train for a resource 
(platform in a station or single track section). A new order (and a new solution) 
is produced for each iteration. This is the method that we propose for solving a 
CSP for this type of problem. 
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Abstract. Parallel machine scheduling, involves the allocation of jobs to the 
system resources (a bank of machines in parallel). A basic model consisting of 
m machines and n jobs is the foundation of more complex models. Here, jobs 
are allocated according to resource availability following some allocation rule. 
In the specialised literature, minimisation of the makespan has been extensively 
approached and benchmarks can be easily found. This is not the case for other 
important objectives such as the due date related objectives. To solve the unre- 
stricted parallel machine scheduling problem, this paper proposes MCMP-SRI 
and MCMP-SRSI, which are two multirecombination schemes that combine 
studs, random and seed immigrants. Evidence of the improved behaviour of the 
EAs when inserting problem-specific knowledge with respect to SCPC (an 
EA without multirecombination) is provided. Experiments and results are 
discussed. 



1 Introduction 

Unrestricted parallel machine scheduling problems are common in production sys- 
tems. The completion time of the last job to leave the system, known as makespan 
(C max ), is one of the most important objective functions to be minimised, because it 
usually implies high utilization of resources. In a production system it is also usual to 
stress minimisation of due date based objectives such as average tardiness (T avg ), 
weighted tardiness (T wt ), or weighted number of tardy jobs (N wt ). 

Branch and Bound and other partial enumeration based methods, which guarantee 
exact solutions, are prohibitively time consuming even with only 20 jobs. To provide 
reasonably good solutions in very short time the scheduling literature offers a set of 
dispatching rules and heuristics. Depending on the particular instance of the problem 
we are facing, some heuristics behave better than others. Among other heuristics [12], 
evolutionary algorithms (EAs) have been successfully applied to solve scheduling 
problems [9-11]. Current trends in evolutionary algorithms make use of multiparent 
[3-5] and multirecombined approaches [6-8]. We call to this latter approach, multiple 
crossovers on multiple parents (MCMP). Instead of applying crossover once on a pair 
of parents, this scheme applies n , crossover operations on a set of n 2 parents. In order 
to improve the trade-off between exploration and exploitation in the search process a 
variant called MCMP-SRI [13,14] recombines a breeding individual (stud) by repeat- 
edly mating individuals that randomly immigrate to a mating pool. Under this 
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approach the random immigrants and multi-mating operation with the stud incorpo- 
rate exploration and exploitation, respectively in the search process. 

If we are trying to incorporate knowledge to the blind evolutionary search process, 
the main issue here is how to introduce problem- specific knowledge? If optimality 
conditions for the solutions are known in advance we can restrict the search operating 
only on solutions which hold these conditions. When optimality conditions are un- 
known, which is the case here, one of the options is to import this knowledge from so- 
lutions that come out of heuristics specifically designed for the problem under consid- 
eration. These types of knowledge-based intermediate solutions contain some of the 
features included in the best (optimal or quasi-optimal) solution at the end of the evo- 
lutionary process. 

Consequently MCMP-SRSI, a latest variant of MCMP-SRI, considers the inclu- 
sion of a stud-breeding individual in a pool of random and seed-immigrant parents. 
Here the seeds generated by conventional heuristics introduce the problem-specific 
knowledge. The following sections describe the above mentioned scheduling prob- 
lems, ways of inserting problem-specific knowledge and provide a discussion of the 
results obtained. 



2 Scheduling Problems 



The problems we are facing [16] can be stated as follows: n jobs are processed with- 
out interruption on some of the m equal machines belonging to the system; each ma- 
chine can handle no more than one job at a time. Job j (j=l,...,n ) becomes available 
for processing at time zero, requires an uninterrupted positive processing time p t on a 
machine, and has a due date dj by which it should ideally be finished. For a given 
processing order of the jobs, the earliest completion time Cj and the tardiness 7} = 
max { Cj -dj, 0} of job j can readily be computed. The problem is to find a processing 
order of the jobs with minimum objective values. The objectives to be minimised are: 



Average Tardiness: 
Weighted Tardiness: 



T = 



-rt 



j = 1 



T = 



= 2 >/; 



Weighted Number of Tardy Jobs: N wt - 'Yw j S{T j ), where S(T ) - 1 if T >0 

7=1 J ■> 

5(T.)=0 otherwise 



These problems have received considerable attention by different researchers. For 
most of them, for many years their computational complexity remained as an open re- 
search topic until established as NP-Hard [16]. 



3 Conventional Approaches to Scheduling Problems 

Dispatching heuristics assign a priority index to every job in a waiting queue. The one 
with the highest priority is selected to be processed next. There are different heuristics 
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[12] for the above mentioned problems whose principal property is not only the qual- 
ity of the results but also to give an ordering of the jobs (schedule) close to the opti- 
mal sequence. The following dispatching rules and heuristics were selected to deter- 
mine priorities, build schedules and contrast their outcomes with those obtained by 
the evolutionary algorithms. 

LPT (Longest Processing Time first): The job with the longest processing time is se- 
lected first. The final scheduled jobs are ordered satisfying: pj>p 2 > . . . > p„. 

WLPT (Weighted Longest Processing Time first): The job with the weighted longest 
processing time is selected first. The final scheduled jobs are ordered satisfying: 
(wi/pi ) < (w 2 /p 2 ) < ■ ■■< (w„ / p n )■ 

SPT (Shortest Processing Time first): The job with the shortest processing time is se- 
lected first. The final scheduled jobs are ordered satisfying: pi <p 2 < ... <p n . 

WSPT (Weighted Shortest Processing Time first): The job with the weighted shortest 
processing time is selected first. The final scheduled jobs are ordered satisfying: 

(wi /p, )>(w 2 /p 2 )> ...> (w„ /p„ ). 

EDD (Earliest Due Date first): The job with earliest due date is selected first. The fi- 
nal scheduled jobs are ordered satisfying: dj <d 2 < ... <d„. 

SLACK (Least slack): The job with smallest difference between due date and process- 
ing time is selected first. The final scheduled jobs are ordered satisfying: 
d r pi < d 2 -p 2 < ...< d n -p n . 

Hodgson’s Algorithm: This algorithm gives an optimal schedule for the un- 
weighted number of tardy jobs objective and behaves well for some instances of 
average tardiness objective. The heuristic provides a schedule according to the 
following procedure, 

Step 1 : Order the activities using EDD heuristic. 

Step 2: If there are no tardy jobs, stop; this is the optimal solution. 

Step 3: Find the first tardy job, say k, in the sequence. 

Step 4: Move the single job j (1 <j <k) with the longest processing time to the end of 
the sequence. 

Step 5: Check the completion times and return to step 2. 

Rachamadugu and Morton Heuristic (R&M): This heuristic provides a schedule ac- 
cording to the following priority, 

Xj = (vvTpJexpl-CL)" IkpJ] 

with Sj = [dj - ( mpj + Ch)] is the slack of job j at time Ch, where Ch is the total proc- 
essing time of the jobs already scheduled, k is a parameter of the method (usually k 
=2.0) and p av is the average processing time of jobs competing for top priority. In the 
R&M heuristic, jobs are scheduled one at a time. Every time a machine becomes free 
for each remaining job a new ranking index is computed. The job with highest rank- 
ing index is then selected to be processed next. 




Studs, Seeds and Immigrants in Evolutionary Algorithms 177 



4 Multirecombination of Random and Seed Immigrants 
with the Stud 

Multiple Crossovers per Couple (MCPC) [6,7] and Multiple Crossovers on Multiple 
Parents (MCMP) [8] are multirecombination methods, which improve EAs per- 
formance by reinforcing and balancing exploration and exploitation in the search 
process. In particular, MCMP is an extension of MCPC where the multiparent ap- 
proach proposed by Eiben [3-5] is introduced. Results obtained in diverse single 
and multiobjective optimization problems indicated that the searching space is effi- 
ciently exploited by multiple applications of crossovers and efficiently explored by 
the greater number of samples provided by the multiple parents. 

A further extension of MCMP is known as MCMP-SRI [13,14]. This approach 
considers the mating of an evolved individual (the stud) with random immigrants. The 
process for creating offspring is performed as follows. From the old population the 
stud is selected by means of proportional selection and inserted in the mating pool. 
The number of n 2 parents in the mating pool is completed with randomly created 
individuals (random immigrants). The stud mates every other parent, the couples un- 
dergo crossover operation and 2*(n 2 -l) offspring are created. The best of these 2 *(n 2 - 
1) offspring is stored in a temporary children pool. The crossover operation is 
repeated n ; times, for different cut points each time, until the children pool is com- 
pleted. Finally, the best offspring created from n 2 parents and n , crossover is inserted 
in the new population. 

As EAs are blind search methods our new variant (MCMP-SRSI) [15], proposes to 
insert problem-specific knowledge by recombining potential solutions (individuals of 
the evolving population) with seeds, which are solutions provided by other heuristics 
specifically designed to solve the scheduling problems under study. In MCMP-SRSI, 
the process for creating offspring is similar to that of MCMP-SRI, except that the 
mating pool contains also seed immigrants. In this way, the evolutionary algorithm 
incorporates problem-specific knowledge supplied by the specific heuristic. Figure 1 
displays these processes. 

We worked with different indirect representations: processor dispatching priorities 
and task priority list (both are indirect-decode representations) and another based on 
permutations. 

The results discussed in next section correspond to EAs that worked on permuta- 
tion-based representation using the PMX crossover operator because with this combi- 
nation of representation and operator we obtained the best results. 



5 Experimental Tests and Results 

As it is not usual to find published benchmarks for the scheduling problems we 
worked on, we built our own test suite with data (p p d p Wj) based on 20 selected data 
corresponding to weighted tardiness problems of size 40 taken from the OR library 
[ 1 , 2 ]. 

These data were the input for dispatching rules, conventional heuristics and our 
proposed multirecombined studs, seeds and immigrants EAs, and SCPC. 
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Fig. 1. The stud and (random immigrants/seeds and random immigrants) multirecombination 
processes 



To evaluate the dispatching rules and the conventional heuristics we used 
PARSIFAL (12] a software package provided by Morton and Pentico, to solve differ- 
ent scheduling problems by means of different heuristics. 

In this work, it is shown the improved performance of both multirecombinated 
EAs (MCMP-SRI and MCMP-SRSI) when compared with an EA without multire- 
combination, called SCPC (Simple Crossover Per Couple). 

The initial phase of the experiments consisted in establishing the best results from 
dispatching rules and conventional heuristics to use them as upper bounds for the 
scheduling objectives. Also, the best parameter values for the EAs were empirically 
derived after performing a set of previous experiments. In all the experiments, we 
used population size 15 and we ran de EAs for 200 generations. The values of the re- 
maining parameters are the following: crossover probability 0.65, n t = 18 , n 2 = 20, 
seed number = 1 (only for MCMP-SRSI). For each problem and algorithm studied we 
performed 30 runs. 

To compare the algorithms, the following relevant performance variables were 
chosen: 

Ebest = ((best value - opt_val)/opt_val)*100 

It is the percentile error of the best found individual when compared with the 
known or estimated (upper bound) optimum value opt_val. It gives a measure on how 
far the best individual is from that opt_val. When this value is negative, the op_val 
has been improved. 

Mean Ebest (MEbe): It is the mean value of Ebest throughout all runs. 

Mean Best (jaBest): It is the mean objective value obtained from the best found indi- 
viduals throughout all runs. 
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Mean Evals (MEvals): Is the mean number of evaluations necessary to obtain the 
best found individual throughout all runs. In our case, the results presented in this pa- 
per are expressed in units of thousands. 

crBest: It is the standard deviation of the objective values corresponding to the best 
found individuals throughout all runs with respect to uBest. 

Best: This coefficient of variation it is calculated as the crBest and (iBcst ratio. 
It represents the deviation as a percentage of the value uBest. When this value is 
closer to zero higher is the robustness of the results obtained by the EA. 

Several experiments were performed for 2 and 5 parallel equal machines schedul- 
ing systems for the objectives described before. The results obtained with the different 
multirecombinated EAs implementations performed well for both scheduling systems. 
In this paper, due to space constraints, we only present the best results obtained corre- 
sponding to the 5 parallel equal machines scheduling problem. Average values of 30 
runs for some of the above performance variables, obtained under SCPC, MCMP-SR1 
and MCMP-SRSI approaches, are listed in the following tables. At the bottom of each 
table the mean values are showed. In all the tables the first column gives the numbers 
of the instances taken from the OR library. In the uBest column the values shown in 
italic bold case have improved the corresponding benchmark. In Tables 2 to 4, the 
Bench column displays the upper bound obtained from the respective heuristic in the 
third column. 

In order to be able to compare the performance of an EA without multirecombina- 
tion, with the multirecombinated EAs’ performance, in Table 1 are presented the re- 
sults found by SCPC, with 200 generations per run. There were also done experiments 
with 500, 1000, 2000, 3000, 4000 and 5000 generations. In view of the precision of 
the results obtained in those cases they were not much better, considering also that the 
average number of evaluations per run increases approximately twice as much. 

For some instances, depending on the objective SCPC reached or improved the 
upper bound obtained by the conventional heuristics. Nevertheless, considering the 
precision of the results found by the multirecombinated EAs (MEbe column in Tables 
2, 3 and 4), it is clear the advantage of using the proposed algorithms in the selected 
scheduling problems. It is important to note, that the cost (in number of evaluations) 
of these algorithms is much higher than the cost of SCPC. Thus, a decision should be 
made when facing this kind of scheduling problems; to obtain high precision results 
with a high cost or to obtain lower precision results at a moderate cost. 

Tables 2, 3 and 4 show that the multirecombinated EAs also performed very well 
with respect to the upper bounds. For all the objective functions studied, the average 
values of MEbest for MCMP-SRSI are lower than the values obtained with MCMP- 
SRI with the additional advantage that MCMP-SRSI presented in all the cases lower 
average values for the mean number of evaluations (MEvals). For example, for the 
weighted tardiness objective (Table 3) the total average number of MCMP-SRI is in 
1843.60 (in units of thousands), which is a value higher than the total average number 
of evaluations performed with MCMP-SRSI. 

When considering the coefficient of variations of the best values found by the ap- 
proaches presented (with and without multirecombination) we see that they group 
around the mean. Although not all the coefficients values are equal to 0.0 they are 
very close suggesting the algorithms’ robustness with respect to the results that they 
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found. In MCMP-SR1 and MCMP-SRSI, their coefficients of variation are lower than 
the ones of SCPC, because of the reinforcement in selective pressure. 



Table 1. SCPC EA performance for 40 jobs problems size 
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Table 2. EAs performance for Average Tardiness 40 jobs problem size 
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-11.71 


430.84 


1.60 


0.00 


1900.94 


96 


760 


SPT 


-1.36 


749.68 


0.57 


0.00 


1664.64 


-1.31 


750.08 


0.59 


0.00 


1877.48 


111 


372 


R&M 


-17.17 


308.15 


2.44 


0.01 


1953.98 


16.97 


308.86 


2.79 


0.01 


1952.28 


116 


379 


SPT 


-9.96 


341.26 


1.96 


0.01 


1864.90 


-10.29 


340.01 


1.43 


0.00 


1947.52 


121 


706 


SPT 


-2.32 


689.61 


0.48 


0.00 


1599.02 


-2.22 


690.35 


0.80 


0.00 


1818.32 






AVG 


-10.68 


344.98 


1.21 


0.01 


1615.07 


-9.71 


345.44 


1.43 


0.01 


1777.13 
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Table 3. EAs performance for Weighted Tardiness 40 jobs problem size 



N° 


Bench 


Heuristic 


MCMP-SRSI 


MCMP-SRI 


MEbe 


|iBest 


aBest (o/|i) 


MEvals 


MEbe 


pBest 


aBest (a/|i) MEvals | 


1 


3800 


R&M 


-34.43 


2491.57 


214.36 0.09 


1226.72 


-34.70 


2481.47 


197.10 


0.08 


1626.90 


6 


15900 


WSPT 


-32.14 


10789.37 


215.57 0.02 


1725.50 


-32.15 


10788.60 


161.87 


0.02 


1739.10 


11 


30600 


WSPT 


-16.49 


25554.53 


266.87 0.01 


1612.62 


-16.58 


25527.37 


261.54 


0.01 


1775.14 


19 


102000 


WSPT 


-4.67 


97232.16 


307.03 0.00 


1734.68 


-4.57 


97342.10 


306.40 


0.00 


1891.08 


21 


95800 


R&M 


-0.15 


95660.37 


29.11 0.00 


1657.16 


0.06 


95860.57 


108.56 


0.00 


1953.98 


26 


2570 


R&M 


-63.03 


950.03 


307.34 0.32 


1156.00 


-61.56 


988.00 


314.24 


0.32 


1370.54 


31 


20400 


R&M 


-42.09 


11814.07 


277.: 28 0.02 


1761.20 


-40.64 


12109.97 


638.43 


0.05 


1859.12 


36 


44200 


WSPT 


-31.84 


30125.50 


891.75 0.03 


1885.30 


-32.82 


29695.10 


674.90 


0.02 


1895.84 


41 


86300 


WSPT 


-11.96 


75974.87 


487.31 0.01 


1785.00 


-11.69 


76208.43 


442.93 


0.01 


1856.06 


46 


81100 


WSPT 


-0.54 


80661.13 


40.31 0.00 


1693.54 


-0.32 


80844.07 


95.05 


0.00 


1941.06 


56 


15000 


EDD 


-72.41 


4137.97 


342.04 0.08 


1580.66 


-71.98 


4203.13 


363.34 


0.09 


1673.14 


61 


47800 


R&M 


-28.41 


34219.77 


527.87 0.02 


1915.22 


-28.91 


33979.10 


438.29 


0.01 


1904.00 


66 


95900 


WSPT 


-9.58 


86711.03 


255.22 0.00 


1801.66 


-9.36 


86920.16 


280.85 


0.00 


1907.40 


71 


114000 


R&M 


-1.23 


112598.73 


69.78 0.00 


1767.32 


-1.04 


112812.57 


133.91 


0.00 


1921.68 


86 


41200 


HODG. 


-40.20 


24636.10 


750.69 0.03 


1866.94 


-40.70 


24429.60 


618.85 


0.03 


1893.46 


91 


82000 


R&M 


-16.41 


68545.13 


327.05 0.00 


1897.54 


-16.22 


68700.30 


296.31 


0.00 


1953.98 


96 


154000 


R&M 


-1.46 


151753.27 


111.06 0.00 


1773.78 


-1.38 


151872.20 


193.80 


0.00 


1953.30 


111 


69000 


R&M 


-25.79 


51201.77 


512.17 0.01 


1949.90 


-25.90 


51127.96 


478.44 


0.01 


1916.92 


116 


71900 


R&M 


-3.92 


69082.50 


408.93 0.01 


1867.96 


-4.20 


68881.16 


380.58 


0.01 


1946.16 


121 


153000 


R&M 


-1.72 


150370.44 


179.70 0.00 


1719.04 


-1.56 


150612.70 


235.67 


0.00 


1893.12 






AVG 


-21.92 


59225.52 


326.07 0.03 


1718.89 


-21.81 


59269.23 


331.05 


0.03 


1843.60 



Table 4. EAs performance for Weighted Number of Tardy Jobs 40 jobs problem size 



N° 


Bench Heuristic 


MCMP-SRSI 


MCMP-SRI 


MEbe 


(iBest 


aBest 


(<7/g) 


MEvals 


MEbe 


gBest 


aBest 


(ct/h) 


MEvals 


1 


25 


WSPT 


-57.60 


10.60 


1.07 


0.10 


719.44 


-56.93 


10.77 


1.01 


0.09 


648.04 


6 


54 


WSPT 


-56.67 


23.40 


2.30 


0.10 


762.28 


-54.14 


24.77 


2.36 


0.10 


1031.22 


11 


62 


WSPT 


-32.80 


41.67 


2.22 


0.05 


714.34 


-27.90 


44.70 


2.68 


0.06 


1107.04 


19 


139 


WSPT 


-15.30 


117.73 


2.32 


0.02 


687.82 


-14.34 


119.07 


2.52 


0.02 


925.48 


21 


163 


SPT 


-0.61 


162.00 


0.00 


0.00 


21.08 


-0.61 


162.00 


0.00 


0.00 


42.84 


26 


21 


EDD 


-81.90 


3.80 


1.00 


0.26 


698.36 


-79.05 


4.40 


1.00 


0.23 


718.76 


31 


51 


WSPT 


-49.15 


25.93 


2.42 


0.09 


819.74 


-48.76 


26.13 


2.40 


0.09 


1144.44 


36 


81 


SPT 


-30.25 


56.50 


4.07 


0.07 


983.28 


-38.02 


50.20 


2.94 


0.06 


1245.42 


41 


98 


WSPT 


-17.41 


80.93 


3.33 


0.04 


821.78 


-15.48 


82.83 


4.33 


0.05 


870.40 


46 


191 


HODG. 


-6.28 


179.00 


0.00 


0.00 


112.20 


-6.28 


179.00 


0.00 


0.00 


93.16 


56 


51 


WSPT 


-75.36 


12.57 


2.70 


0.21 


1010.48 


-75.95 


12.27 


2.26 


0.18 


865.98 


61 


93 


WSPT 


-51.47 


45.13 


3.15 


0.07 


1228.08 


-50.50 


46.03 


2.41 


0.05 


1212.78 


66 


155 


WSPT 


-32.80 


104.17 


3.46 


0.03 


888.42 


-31.59 


106.03 


3.40 


0.03 


1094.12 


71 


194 


SPT 


-13.95 


166.93 


1.64 


0.01 


573.58 


-13.90 


167.03 


2.08 


0.01 


635.46 


86 


110 


WSPT 


-58.03 


46.17 


5.42 


0.12 


1351.16 


-55.88 


48.53 


5.07 


0.10 


1456.22 


91 


121 


WSPT 


-35.70 


77.80 


3.60 


0.05 


880.60 


-35.87 


77.60 


2.43 


0.03 


895.22 


96 


192 


WSPT 


-7.81 


177.00 


0.00 


0.00 


198.56 


-7.81 


177.00 


0.00 


0.00 


183.26 


111 


95 


WSPT 


-21.23 


74.83 


2.53 


0.03 


681.70 


-24.35 


71.87 


1.96 


0.03 


834.36 


116 


148 


SPT 


-38.29 


91.33 


4.03 


0.04 


946.22 


-39.05 


90.20 


2.78 


0.03 


853.74 


121 


180 


HODG. 


-5.98 


169.23 


0.77 


0.00 


571.88 


-6.07 


169.07 


0.25 


0.00 


387.26 






AVG 


-34.43 


83.34 


2.30 


0.06 


733.55 


-34.12 


83.48 


2.09 


0.06 


812.26 
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In Figure 2, are shown the jjBest values corresponding to Table 3. Although the 
differences among the multirecombinated methods are small and therefore, difficult to 
appreciate them graphically, for this objective, SCPC reached the lowest number of 
instances’ benchmarks improved. 




□ Benchmark DSCPC 1MCMP-SRI DMCMP-SRSI 
Fig. 2. Weighted Tardiness MBest values found by the EAs 



6 Conclusions 

Multirecombinated evolutionary algorithms have been successfully used to solve 
scheduling problems. In particular MCMP-SR1 and MCMP-SRSI, the approaches 
considered here have demonstrated their ability on unrestricted parallel machine due 
date related scheduling problems, by improving the results found by an EA without 
multirecombination and the upper bounds calculated with different conventional heu- 
ristics using PARSIFAL [12] for various problem data taken from the OR-Library. 

It is known that in some cases seeding individuals in the population is equivalent 
of running for a few more generation the EA without seeding individuals. However, 
in our studies we have concluded that this is true for small and medium scheduling 
problem sizes, and also depends on the hardness of problem instance and the type of 
EA used. When comparing MCMP-SRSI with MCMP-SRI, the former finds good re- 
sults with a lower cost (number of evaluations), due to the knowledge of the problem 
used to guide the search to promising areas of solutions. In this way using seeds in the 
population instead of running the EA for more generations has the advantage of 
speeding the convergence of the EA towards good solutions at a lower cost. 

Future work will be devoted to solve due date related problems in parallel machine 
scheduling systems for larger number of jobs and to compare the performance of the 
different EAs implemented with others population-based stochastic search heuristics. 
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Abstract. We investigate the technique of genetic algorithms to solve 
the class of STRIPS planning problems in Artificial Intelligence. We 
define a genetic algorithm called AgPlan and we compare it with some 
recent planners. 

Keywords: Planning, Evolutionary Computation. 



1 Introduction 

Given a formal description of the behaviour of a set of actions, to find a sequence 
of actions which lead from a known state of the world to some desired one 
is defined as the Planning Problem in Artificial Intelligence. When the formal 
language used to describe actions is STRIPS [1] we call it STRIPS Planning. 
This restricted class of problems was proved to be PSPACE-complete [2]. 

For more than twenty years the classical AI search techniques were not suf- 
ficient to find solutions even for simple problems as the Sussman anomaly, a 
simple problem with three blocks in the well known blocks world domain. One 
main problem is that the search space is really huge. Moreover, there were no 
good heuristic functions to guide the process. The proposals failed reaching very 
easily in local extremes. 

Two recent approaches changed that picture. One was SATPLAN whose 
main feature was to relate planning with satisfiability [3]. The other one was 
GRAPHPLAN[ 4], It is based on a structure called the plan graph which suc- 
cessfully reduced the huge search space to a smaller part of it where a solution 
can more easily be found. 

Since then research in the planning held has known an extraordinary growth. 
Several other areas of research contributed with important algorithms and tools. 
We can cite integer programming [5], constraint satisfaction [6], Petri nets [7], 
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among others. Also, ancient techniques were reintroduced, e.g. the system R . , a 
new implementation of the original STRIPS , has manage to solve really huge 
instances of difficult problems as the blocks world [8]. Of particular interest to us, 
very good heuristic functions were finally found and the heuristic search planners 
HSP [9] and FF [10] are among the best planners known today. Although some 
of these algorithms are very fast in general and are capable of dealing with large 
instances of problems, there are still open research to be done. 

Genetic algorithms [11] (GA’s) are usually considered as a possible technique 
to apply when the search space is huge. Surprisingly, little has been done to 
improve genetic techniques in domain independent planning. Main contributions 
seems to be related with robotics, but in terms of genetic programming [12, 13]. 
However, in the planning research it is usually considered that planners must 
be general purpose [14]. In this sense, as far as we could investigate, there is no 
generic planner based on genetic algorithms. In this paper we show the results 
of our research in this direction. 

The paper is organised as follows. Section 2 contains some basic points about 
planning. In section 3 we introduce genetic algorithms in the context of plan- 
ning and we present the details of our genetic algorithm. In section 4 we show 
how it works when applied to few well known domains used in the planning 
competitions. 



2 The Planning Problem 

The planning problem in Artificial Intelligence is, basically, to determine a se- 
quence of actions which, when applied into some known initial state, leads to 
some desired final state. This problem is considered to be domain independent. 
The planner must be capable to read a planning problem consisting of a descrip- 
tion of the behaviour of the actions and a description of the initial and the final 
situation. It must return a sequence of actions that, when performed in the given 
order to the initial situation, transforms it into the final situation. In our case, 
the planning problems are described in PDDL [15] that is considered today the 
common generic language to the planning community. 

Any sequence of actions that do so is called a plan. Unfortunately, there is no 
fixed size for the plan. In most situations, it is supposed that the planner must 
return the optimal plan. On the other hand, as it is difficult, just to find some 
plan is considered to be enough. Usually, the planner must run with limited 
memory and time. In this paper we consider that any plan which solve the 
problem is a solution. We do not consider questions related to the quality of 
the plan or the number of actions used. In other terms, we do not consider the 
planning problem as an optimisation problem. 

As we have said in the introduction there are several reasons for which we 
think genetic algorithms apply well in planning problems, although it is not a 
trivial task to elaborate the encodings of the GA. The next section describes 
this task in detail. 
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3 Preliminary Discussion on Genetic Planners 

In this section we recall the main points. A more complete description may be 
found in [16]. In the next section we present alternative definitions. 

3.1 Chromosome Encoding 

The first basic point is how to encode a chromosome. Chromosomes are candi- 
dates of solution to the problem being solved, in our case, they are possible plans. 
In this way each gene will represent a single action of the plan. Consequently, the 
whole chromosome is the full plan, i.e., a complete solution. An action is repre- 
sented by an integer (from 1 to the possible number of actions). Here comes the 
first difficult decision: plans are sequence of actions with no pre-specified length 
and in general chromosomes have fixed size. 

Our first decision was to simulate variable length size using fixed-length chro- 
mosome with the inclusion of a new action in the list of possible actions (“NOP” 
- no operation) . This is a null action that can be applied in any point of a plan 
and has no effect to the current state of the world. With the use of this extra 
action a given plan can be extended as wanted, without loosing its original ef- 
fect. Variable-length plans were then handled by fixed-length chromosomes, by 
setting up an upper bound length for plans and filling them with NOP’s. This 
decision was well suited for our experiments, since we were using a standard 
package called pgapack 1 . 

But we found in practice that we spent most of the time taking care of the 
simulation process and the planner was too slow. The final decision was to define 
a variable-length chromosome. The impact of this decision was that we decided 
to abandon that package and to develop all the code by ourselves. 

3.2 The Fitness Function 

The fitness function measures the quality of a chromosome, it must evaluate the 
individuals which are in fact plans with the high possible value. The question 
is: how do we know that a sequence of actions is a plan? The answer is that 
the chromosome must be validated starting with the action represented by its 
first gene been applied to the initial state and sequentially all other actions 
represented by the others genes until the final one obtains the goal state of the 
problem. 

Suppose a chromosome with size n and genes gi, g%, ■ ■ ■ , g n - It may happen 
that gi is not applicable to the initial state, but 52 is. It is possible that 53 is not 
applicable in the state resulting for the application of g 2 , but g 4 is. One possible 
decision is to not allow this kind of situation. We have however take another 
way. 

Our decision was to define an auxiliary binary vector parallel to the chro- 
mosome to give information about executability of the actions represented by 



1 http://www-fp.mcs.anl.gov/CCST/research/reports_prel998/ 
/ comp_bio/stalk/pgapack.html 
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the genes with relation with the initial state. In this way, let’s consider the first 
non-zero element in this binary auxiliary vector. Let’s say that it is in position 

i. This means that all actions associated with the genes in positions 1 to * — 1 
are not applicable to the initial state, but that on the itli position is. Let’s now 
say that the next non-zero element in the auxiliary vector is in position j . This 
means that all genes representing actions in the chromosome starting in posi- 
tion i + 1 until position j — 1 are not applicable to the state resulting from the 
application of the previous valid action but that at the associated j position is. 

But this is not enough, we need to measure how far a sequence of actions is 
from been a solution, if it is not one. Giving that we have two vectors, one con- 
taining a candidate to a solution and the other containing information about the 
executability of actions in the first vector, our fitness function is a combination 
of informations extracted from both vectors. 

Another point of consideration is that it may happen that a chromosome is 
not a solution, but it is a good sequence of actions that lead us closely enough 
of a solution. They may be considered good beginnings of plans. It may happen, 
on the other hand, that it a very good sequence of actions which obtains the 
final state, but not from the initial state. They may be considered good final of 
plans. We then let for the genetic operator of crossover to put the good parts 
together forming a complete plan. 

Now, our fitness function will be a combination of the following information: 

1. the number of executable actions in the progressive direction, that is, the 
number of non-zero elements on the vector of executability when the valida- 
tion is analysed from the initial state; 

2. the number of executable actions in the regressive direction, that is, the num- 
ber of non-zero elements on the vector of executability when the validation 
is analysed from the final state; 

3. the number of propositions of the goal state reached when the associated 
plan is applicable from the initial state; 

4. the number of propositions of the initial state reached when the associated 
plan is applicable from the final state; 

Actually we have tried with several other informations, like, e.g.: 

1. the number of consecutive executable actions; 

2. the number of valid actions; 

3. the position of the first executable action; 

4. the position of the last executable action; 

5. the number of actions of the largest feasible subsequence (fragment of the 
plan) from the beginning of it; 

6. the number of applicable actions of the largest partially feasible subsequence, 
from the beginning of it. This factor is different from the previous because 
it allows non-applicable actions being present in an feasible subsequence; 

The idea was to obtain high quality index for a candidate solution based 
on its number of feasible actions. The larger the number of feasible actions an 
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individual, and the more organised they are, the higher the quality it would have. 
It is in fact difficult to establish a good fitness function. If we try to maximise 
execut ability the resulting chromosomes have lots of useless sequences of actions, 
as unloading the truck after having loaded it. If we try to give high values to 
individual with leads to a great number of goals, then we get no solution. 

3.3 The Genetic Operators 

We define three operators: mutation, crossover and a new one called compression. 

The classical mutation operator was used to assure an even exploration of the 
search space, aiming to avoid the algorithm gets stuck in local extremes. This 
operator randomly chooses a gene and changes its value to a random integer 
ranging from 0 to the size of the list of possible actions. For every gene of the 
chromosome, this operator is applied with a given small probability. This is 
shown in figure 1. 




Fig. 1. Example of mutation 



For the crossover operator, it is in general implemented in such a way that 
some point is randomly chosen in the chromosome and this point is applied 
to both parents. But here, for the reason that we had defined variable-length 
size chromosomes, we decided to implement a different version. In our work we 
randomly choose two points referring respectively to the recombination point for 
the first and second individual. This is shown in figure 2. 
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Fig. 2. Example of our crossover 



Due to the way we had chosen to construct the chromosome, allowing for 
them to have non-executable sequences of actions (for this reason we have to 
work with a parallel binary vector for executability) , it was necessary to define 
a new operator called compression. The reason is that in some moments, our 
chromosomes are too big. This operator looks for null elements in the binary 
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vector of executability which, when found, are removed from both vectors, the 
one defining the chromosome and the one of binary executability of actions. This 
is shown in figure 3. 
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Fig. 3. The compression operator 



3.4 The Running Parameters 

In this section we describe the way we initialise the initial population, the method 
of replacement of the population and the rates of application of the genetic 
operators. 

There are two possible ways to generate the initial population. The first is 
to let it be chosen randomly. The other way is to have some control on the 
process. In all the experiments we have done with an randomly initialised initial 
population we get very disappointing results for every possible combination of 
the genetic parameters. To control the generation of the initial population we 
used the plan graph, in its relaxed and full versions. The idea is to choose actions 
appearing in the initial layers as the first genes and actions of the last layers 
appearing in the last part of the chromosomes. As we will show in the next 
section this has a great impact in our approach. 

The algorithm is as follows: construct the plan graph and for each layer 
randomly choose a number of actions to be randomly chosen in that layer to 
compose the chromosome. Do it for all layers. This will give us individuals with 
different size. 

The selection method has a great impact in the convergence of the GA. In 
this work we decided to replace the entire population. 

The genetic operators were applied with the following probabilities: 1% for 
mutation, 80% for crossover, which are within the usual range for GA’s, and 30% 
for compression. These values were obtained after a number of experiments. 

For the fitness function we get our best results with values in the range 0.4 
to 0.45 for the two parameters controlling the number of goals (backwards and 
forwards) and values in the range 0.05 to 0.1 to the variables controlling the 
executability of actions). In the majority of cases we evaluate both forwards and 
backwards. 

It must be said however that the most important running parameter is the 
plan graph based generation of the initial population. The GA finishes when we 
have found a solution or a previously specified number of maximal generation 
was found. 

In the next section we show some experiments running the GA with different 
running parameters for the size of initial population and number of generations. 
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4 Experiments 

We have done several experiments with AgPlan and we compare it with some of 
the best known planners, all of them based on the plan graph construction: 

— IPP : implements the GRAPHPLAN algorithm [17]; 

— BLACKBOX: replaces the second phase of GRAPHPLAN by a translation 
to a SAT instance and then solve it using walksat[18]; 

— FF: it implements a heuristic search planner and is considered the faster 
of the actual planners. The heuristic function is based on the relaxed plan 
graph [10]. 

We choose three classical scenarios used in the planning competitions: logis- 
tics, gripper [19] and blocks- world [20]. For lack of space we will show only part 
of our experiments, giving two instances of each problem, including some easy 
and some difficult ones. 

AgPlan implements the GA described in section 3 in this paper. AgPlan - 
r implements the GA with randomly initialised initial population. AgPlan- m 
implements the generation of the initial population using the complete plan 
graph, i.e., considering the mutex relation. 

The implementation was realized in C and C++ languages for the GNU - 
Linux environment. The tests were run in a dual Xeon with 2.6GHz with 4GB 
of RAM by December 2003 [21] as part of a graduate course on Planning in AI. 

4.1 Results for Gripper 

Figure 4 we show the results for the gripper problem 22 balls (left table) and 42 
balls (right table). They are considered very difficult problems. In both cases we 
have used 30 as the maximum number of generations. We have used population 
of size 700 for the problem with 22 balls and 3600 for other. We analyse the size 
of the resulting plan and the time to obtain it. 

Only FF and AgPlan have managed to solve these two huge instances of 
problems. It is true that FF is faster, but it may be considered a good result 
for AgPlan. Moreover, we can see the importance of using the plan graph to 
construct the initial population, AgPlan- r had no good results. The initialisation 
considering mutex seems to make the algorithm getting slow. 

The plans are longer than those of FF, but a careful analysis shows that most 
of the actions are of the type “load” followed by “unload”. This can be solved 
as a post-processing step, as it is done for some planners, e.g., the system R. [8]. 

4.2 Results for Blocks World 

Figure 5 shows our experiments with the classical Sussman anomaly (left table) 
and another using 8 blocks to be put in the inverse position (right table). The 
population had up to 9000 to solve the latter. They are presented (in order) below. 

AgPlan seems to apply well in this domain. Even the version with randomly 
initialised initial population run faster then IPP and BLACKBOX . It has found 
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Fig. 4. Experiments with gripper 
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Fig. 5. Experiments with blocks- world 



a shorter plan than FF. Here the initial population seems to have no other 
importance except that for the random case which are slower and have bigger 
plans. The results are satisfactory. 

4.3 Results for Logistics 

The only planners that managed to solve logistics problems were FF and IPP . 
This latter solves only two of them. The reason seems to be the number of layers 
in the relaxed plan graph, which is very small due to the high rates of parallelism 
of its actions. 



5 Conclusion 

In this paper, a methodology for dealing with planning problems using Ge- 
netic Algorithms was presented. The proposed GA has special features specially 
devised for such kind of problem, including a variable-length chromosome for 
variable-size plans, and a special genetic operator, namely, compression. We 
have also implemented a special version of crossover with two points. 

Our planner is general purpose and works only with typed versions of the 
scenarios. Previous experiments shows that the non-typed versions give us very 
poor results. One possible way to overcome that is to implement automatic type 
verification. This problem was investigated by [22] and [16]. 

This methodology is still under development, but the results are encourag- 
ing. Although many experiments are yet to be done, we believe that it will be 
particularly interesting for problems that cannot be addressed with current tra- 
ditional methods. In particular, based on hundreds of experiments, we think that 
the generation of the initial population based on the plan graph construction is 
the key for the success of our GA. 
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Future work includes some experiments with different genetic parameters in 
order to improve the algorithm for harder problems, specially those involving 
domains with huge number of parallel actions, as logistics. This may include the 
use of local search or island parallel genetic algorithms. 
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Abstract. Inductive Logic Programming (ILP) systems have been 
largely applied to classification problems with a considerable success. 
The use of ILP systems in problems requiring numerical reasoning ca- 
pabilities has been far less successful. Current systems have very limited 
numerical reasoning capabilities, which limits the range of domains where 
the ILP paradigm may be applied. 

This paper proposes improvements in numerical reasoning capabili- 
ties of ILP systems. It proposes the use of statistical-based techniques 
like Model Validation and Model Selection to improve noise handling and 
it introduces a new search stopping criterium based on the PAC method 
to evaluate learning performance. 

We have found these extensions essential to improve on results over 
statistical-based algorithms for time series forecasting used in the empir- 
ical evaluation study. 



1 Introduction 

Inductive Logic Programming (ILP) [1] has achieved considerable success in 
domains like biochemistry [2] , and language processing [3] . The success of those 
applications are mainly due to the intelligibility of the models induced. Those 
models are expressed in the powerful language of first order clausal logic. In 
the domains just mentioned, the background knowledge is mainly of a relational 
nature. Theoretically there is no impediment of using whatever knowledge is 
useful for the induction of a theory. For some applications it would be quite useful 
to include as background knowledge methods and algorithms of a numerical 
nature. Such an ILP system would be able to harmoniously combine relations 
with “numerical methods” in the same model. A proper approach to deal with 
numerical domains would therefore extend the applicability of ILP systems. 

Current ILP approaches [4] to numerical domains usually carry out a search 
through the model (hypothesis) space looking for a minimal value of a cost 
function like the Root Mean Square Error (RMSE). Systems like TILDE [5] are 
of that kind. One problem with the minimisation of RMSE in noisy domains 
is that the models tend to be brittle. The error is small when covering a small 
number of examples. The end result is a large set of clauses to cover the complete 
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set of examples. This is a drawback on the intelligibility of ILP induced models. 
This aspect is also an obstacle to the induction of a numerical theory, since we 
end up with small locally fitted sub-models, that may not correspond to the 
overall structure of the underlying process that generated data. 

In this paper we introduce improvements on the numerical reasoning capabil- 
ities of ILP systems by adopting statistical-based noise handling techniques such 
as: (i) model validation and; (ii) model selection. We also propose a new stopping 
criterium based on the PAC [6] method to evaluate learning performance. The 
proposals lead to considerable improvements in the ILP system used on the em- 
pirical evaluation. The experimental results show that an ILP system extended 
with such procedures compares very well with statistical methods. 

The rest of the paper is organised as follows. Section 2 identifies the steps of 
a basic ILP algorithm that are subject to improvements proposed in this paper. 
The proposals for Model Validation are discussed in Section 3. In Section 4 
we propose the the stopping criterium. The proposals for Model Selection are 
discussed in Section 5. Section 6 presents the experimental findings. The related 
work is discussed in Section 7. Finally, in Section 8 we draw the conclusions. 



2 Search Improvements 

In ILP, the search procedure is usually an iterative greedy set-covering algorithm 
that finds the best clause on each iteration and removes the covered examples. 
Each hypothesis generated during the search is evaluated to determine their 
quality. A widely used approach in classification tasks is to score a hypothesis by 
measuring its coverage. That is, the number of examples it explains. In numerical 
domains it is common to use the RMSE or Mean Absolute Error (MAE) as a 
score measure. Algorithm 1 presents an overview of the procedure and identifies 
the steps modified by our proposals. 



Algorithm 1 Basic cycle of a greedy set-covering ILP algorithm 

1: repeat 
2: repeat 

3: synthesise a hypothesis 

4: accept a hypothesis (Model Validation) 

5: Update best hypothesis (Model Selection) 

6: until Stopping Criterion satisfied (PAC-Based Stopping Criterium) 

7: Remove explained examples 

8: until “All” examples explained 



We propose an improvement to step 4 where a hypothesis is checked if it 
is a satisfactory approximation of the underlying process that generated data. 
We propose the use of statistical tests in that model validation 1 step. Step 5 is 
improved to mitigate the over-fitting problem, which the fragmented structure 



1 Model Validation term is used in the statistics sense. 
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of the induced theories suggests, using a model selection criterium. Our proposal 
for step 6 is inspired on the PAC [6] method to evaluate learning performance. 

We use the terms hypothesis, model and theory with the following meaning in 
this paper. An hypothesis is a conjecture after a specific observation and before 
any empirical evaluation has been performed. A model is a hypothesis that has 
at least limited validity for predicting new observations. A model is a hypothesis 
that has passed the model validation tests. A Theory is a set of hypotheses whose 
prediction capabilities have been confirmed through empirical evaluation. 



3 Model Validation 

In most applications, the true nature of the model is unknown, therefore, it 
is of fundamental importance to assess the goodness-of-fit of each conjectured 
hypothesis. This is performed in a step of the induction process called Model 
Validation. Model Validation allows the system to check if the hypothesis is 
indeed a satisfactory model of the data. This step is common both in Machine 
Learning and Statistical Inference. 

There are various ways of checking if a model is satisfactory. The most com- 
mon approach is to examine the residuals. The residuals are the random process 
formed from the differences between the observed and predicted values of a vari- 
able. The behaviour of the residuals may be used to check the adequacy of the 
fitted model as a consequence of the Wold’s theorem, defined as follows. 

Theorem 1 (Wold’s Theorem). Any real-valued stationary process may be 
decomposed into two different parts. The first is totally deterministic. The second 
totally stochastic. The stochastic part of the process may be written as a sequence 
of serially uncorrelated random variables z with zero mean and variance a 2 . The 
stationarity condition imply a < oo, thus z is a White Noise (WN) process: 

z ~ WN{ 0, a) (1) 

According to condition (1) of the Wold’s theorem, if the fitted model belongs 
to the set of “correct” functional classes, the residuals should behave like a white 
noise process with zero mean and constant variance. Hypotheses whose residuals 
do not comply with condition (1) may be rejected using specific statistical tests 
that check randomness. The Ljung-Box test [7], is one of such tests. The null 
hypothesis of the Ljung-Box test is a strict white noise process. Thus, residuals 
are independent and identically distributed (i.ixl.). According to the definition 
of statistical independence residuals are incompressible. Muggleton and Srini- 
vasan [8], have also proposed to check noise incompressibility for evaluating 
hypothesis significance but in the context of classification problems. 

Other statistical tests may be incorporated to check our assumptions re- 
garding error structure, like tests for normality. The use of residuals for model 
assessment is a very general method which apply to many situations. 
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4 Stopping Criterium 

The stopping criterium proposed is based on the PAC [6] method to evaluate 
learning performance: P(\z\ > e) < 6. The stopping criterium stops the search 
whenever the probability that the error is greater than the accuracy (e) is less 
than the confidence interval (<5). Different degrees of “goodness” will correspond 
to different values of e and 6. 

In this section we propose to calculate the bound, <5, for any unknown dis- 
tribution. Theorem 2, proves the existence of that bound for a single clause 
(hypothesis) and provides a procedure to calculate the error probability for a 
given accuracy level. Corollary 1, generalises the bound on the error probability 
to multi-clausal theories. 

Theorem 2 (Bounding Error Probability of an Hypothesis). Let z be 

the residuals from the hypothesis hi. Assume z is independent and identically 
distributed (i.i.d.) with distribution variance a 2 . Then the probability of the error 
being greater then e is bounded by: 

2 

P(\z\ >e\hi)<8, 6=°^ (2) 

e A 

Proof: Let the residuals z\, z %, . . . , z n be a sequence of i.i.d. random variables 
each with finite mean p and a. if 2 = (zi + . . . + z n )/n is the average of 
Zi, Z 2 , ■ ■ ■ , z n , then, it follows from the week law of large numbers [7] that: 

z-^> p (3) 

Let the sample variance be S n = 4 J2*j=i( z j ~ z ) 2 = n ^j = l z j ~ ^ > w h ere 
2 is the sample average. It follows from the Slutski’s lemma [7] that: 

S n <t (4) 

Assuming the residuals 2 of the hypothesis hi pass the null hypothesis of the 
Ljung-Box test, then they will comply with a strict white noise process with zero 
mean and finite variance, yielding thereby: 

/i = 0, (T < oo (5) 

Following Conditions (3) and (4), each observation may be considered drawn 
from the same ensemble distribution. Thus, the sample mean and variance of 
the joint distribution converge to the ensemble mean and variance. Moreover, 
condition (5) states that both values are finite and, therefore, for all e > 0, the 
Chebislrev’s inequality bounds the probability of the residuals value, z, being 
greater then e to: 



P(\z\ >e\hi)< — 



( 6 ) 
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Corollary 1 (Bounding Error Probability of a Theory). Let H be a set 

of hypothesis (clauses) that describes a given theory T. Assume: 

P(\z\ > e | hi) <6, 'ih^H (7) 

then, for theory T, the probability of the error, z, being greater than e, is also 
bounded by P{\z\ > e) < S. 

We recall that just one clause is activated at each time thus all clauses of a 
non-recursive theory are mutually exclusive regarding example coverage, i.e. 

hi n hj = 0 (8) 

We also recall that the prior probability of hi, P{hf) may be estimated cal- 
culating the frequency of hi on the training set and dividing it by the coverage 
of the theory. Because the sum of the frequencies of all hypotheses is equal to 
the theory coverage, then 

E p (M = 1 ( 9 ) 

Wien- 

Proof: Let conditions (8) and (9) hold, then it follows from the total probability 
theorem that: 

P(\z\ > e) = E P (M > e I hi)P(hi). (10) 

Let condition (7) hold, then we may substitute P{\z\ > e \ hi) by 8 in equation 
(10), yielding thereby: P(\z\ > e) < 8J2 \/ h . EH p (hi). Since Ev hi€H p ( h i) = 1 
and P(\z\ > e \ hi) < 8 , VhiCH , then: 

p ( M > <0 < 6 (li) 



5 Model Selection 

The evaluation of conjectured hypotheses is central to the search process in ILP. 
Given a set of hypotheses of the underlying process that generated data, we wish 
to select the one that best approximates the “true” process. The process of eval- 
uating candidate hypothesis is termed model selection in statistical inference. 

A simple approach to model selection is to choose the hypothesis that gives 
the most accurate description of the data. For example, select the hypothesis 
that minimises R.MSE. However, model selection is disturbed by the presence 
of noise in data, leading to the problem of over fitting. Thus, a hypothesis with 
larger number of adjustable parameters has more flexibility to capture complex 
structures in data but also to fit noise. Hence, any criterium for model selec- 
tion should establish a trade-off between descriptive accuracy and hypothesis 
complexity. 
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5.1 Hypothesis Complexity 

Defining a theoretically well-justified measure of model complexity is a central 
issue in model selection that is yet to be fully understood. In Machine Learning, 
some authors have advanced their own definition of complexity. Muggleton [9] 
proposes a complexity measure based on the number of bits necessary to encode 
a hypothesis. Dzerovski [10], proposes a complexity measure based on the length 
of a grammar sentence in the Lagramge system. Both of the referred complexity 
measures are sensitive to the hypothesis functional form. This is clear since both 
penalise each literal added. We argue that the functional form is not a good 
approximation to measure the complexity of a real-valued hypothesis, since any 
real-valued function can be accurately approximated using a single functional 
class. This follows directly from Approximation Theory as the Kolmogorov’s 
superposition theorem illustrates. 

Theorem 3 (Kolmogorov Superposition Theorem). Any continuous mul- 
tidimensional function f(xi, . . . ,x m ), can be represented as the sum of m + 1 
functions. These functions are called universal functions because depend only on 
the dimensionality m and not in the functional form of f. 

Following theorem 3, the sum of universal functions is proportional to the 
dimensionality to. This highlights the role of dimensionality on a definition of 
hypothesis complexity. A few arguments on computational complexity and esti- 
mation theory also support this claim. Since the machine learning algorithm is 
given a finite dataset, models with fewer adjusted parameters will be easier to 
optimise since they will generically have fewer misleading local minima in the 
error surfaces associated with the estimation. They will be also less prone to the 
curse of dimensionality. They will require less computational time to manipulate. 
A model with fewer degrees of freedom generically will be less able to fit statis- 
tical artifacts in small data sets and will therefore be less prone to the so-called 
“generalisation error”. Finally, several authors (Akaike [11]; Efron [12]; Ye [13]) 
proposed measures of model complexity which in general depend on the number 
of adjusted parameters. Consequentially, the adopted measure of complexity in 
this work is the number of adjusted parameters to data. 

5.2 Model Selection Criteria 

There are several model selection criteria considering the adopted measure of 
model complexity. Among these, we may find: (i) Akaike Information Criterium 
(AIC) [11], defined as AIC = —2 In (L) + 2k; (ii) Akaike Information Criterium 
Corrected for small sample bias(AICC) [11], defined as AICC = — 21n(L) + 
2k n _1 +1 ; (iii) Bayesian Information Criterium (BIC) [14], defined as BIC = 
— ln(L) +ln(n)fc and; (iv) the Minimum Description Length (MDL) [11], defined 
as MDL = — ln(L) + § ln(n) + (| + 1) ln(fc + 1). 

The estimation of an hypothesis likelihood function, L , with k adjusted pa- 
rameters, requires a considerable computational effort and the assumption of 
prior distributions. In this context, the Gaussian distribution plays an impor- 
tant role in the characterisation of the noise, fundamentally due the central limit 
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theorem. Assuming error is i.i.cl. drawn from a Gaussian distribution then, the 
likelihood of a hypothesis given the data [11] is: ln(L) = — ^(l+ln(27r)+ln(dy 2 )), 
where of = - Y?i = 1 an d z are the residuals of the induced hypothesis. Since 
the compared hypotheses may have different coverages, the criteria were nor- 
malised by the sample size, as suggested by Box and Jenkins [15] (pg. 201). This 
approach has the advantage of indirectly biasing the search to favour hypotheses 
with higher coverage, and consequentially, theories with less clauses. 

Analytical model selection criteria like AIC and BIC are asymptotically 
equivalent to leave-one-out and leave-v-out cross-validation [16]. However, they 
have the advantage of being incorporated in the cost function, which reduces 
relatively the computational effort. 

Other authors presented similar work in this area. Zelezni [17] derives a model 
selection criterium under similar assumptions that uses the Muggleton’s com- 
plexity measure, which according to the adopted definition of complexity, is 
unsuitable for our purposes. It also requires the calculation of the “general- 
ity” function for each induced hypothesis. His formulation does not estimate the 
modal value of the likelihood, so the final equation includes the usually unknown 
nuisance parameter of the hypothesis, which somehow limits its practical use. 

5.3 Choosing a Model Selection Criterium 

Model selection criteria have different characteristics, thus, it is essential to clar- 
ify its application conditions to numerical problems in ILP. 

The use of AIC is recommendable if the data generating function is not in any 
of the candidate hypotheses and if the number of models of the same dimension 
does not grow very fast in dimension, then the average squared error of the 
selected model by AIC is asymptotically equivalent to the smallest possible one 
offered by the candidate models [16]. Otherwise, AIC cannot be asymptotically 
optimal, increasing model complexity as more data is supplied [14]. 

The use of BIC and other dimension consistent criteria like MDL is advisable 
if the correct models are among the candidate hypothesis, then the probability 
of selecting the true model by BIC approaches 1 as n — > oo. Otherwise, BIC has 
a bias for choosing oversimplified models [16]. 

Since in machine learning except rare instances the true model is never known 
or when it is known is usually intractable, AlC-like criteria are preferable. 

6 Experimental Evaluation 

This section presents empirical evidence for the proposals made in this paper. 
We propose to use an ILP system to learn a new time series model for prediction. 
In this sense, the experiment has been inspired in the Colton and Muggleton [18] 
application of an ILP system to scientific discovery tasks. It is also related with 
Zelezni’s [17] work since it learns a numeric function. The datasets used were 
collected from statistics and time series literature. For each dataset we compared 
our results with the published ones. 
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6.1 Datasets 

Canada’s Industrial Production Index [19]; USA Unemployment rate [20]; ECG 
of a patient with sleep apnea [21] and; VBR. Traffic of an MPEG video [22] . These 
datasets consist of facts that relate the time with an output variable. The time is 
expressed in discrete intervals and the output is a real- valued variable. The mode 
declaration for the head literal is of the form: timeseries(+Time, —Output). 

6.2 Benchmark Models 

In this experiment we compared theories induced by the raw IndLog system 
with the following models: IndLog with Model Validation and Model Selection 
activated (IndLog MVS ); Auto- Regressive Integrated Moving Average (ARIMA); 
Threshold Auto-Regressive (TAR); Markov Switching Autoregressive (MSA); 
Autoregressive model with multiple structural Changes (MSC); Self- Excited 
Threshold Auto-Regressive (SETAR); Markov Switching regime dependent 
Intercepts Autoregressive parameters and (H)variances(MSIAH); Markov 
Switching regime dependent Means and (H) variances (MSMH); Bivariate Auto- 
Regressive models (Bivariate AR) and; Radial Basis Functions Networks (RBFN) 

6.3 Learning Task Description 

The experiment consists of learning a modified class of the TAR [23] model using 
an ILP system. The model extends the TAR model , by adding extra degrees 
of freedom that will be estimated in run-time. The induced clauses are of the 
kind: timeseries(T, X) <— inlnterval(i? m , T, D, R m+ \), armodel(T, P, X). Where 
the variable X is the time-series observed, m is the index denoting the sub- 
region, R rn denotes the threshold amplitude of region m and armodel is the 
background knowledge literal for the AR Model [23]. The main difference from 
the original TAR structure is that instead of a single D value, we have one D 
for each sub-region. The learning task consists of estimating: (i) the number of 
parameters p of each AR sub-model; (ii) The time delay D of each sub-model 
and; (iii) The thresholds that bound each subregion R m and R m + 1 - 

6.4 Results Summary and Discussion 

This section presents the results obtained for each dataset of Section 6.1. Those 
datasets were studied in several papers, using different classes of models. Thus, 
all time series datasets in table 1 have an AR model that may be used as a 
reference across datasets. The recall number for the Unemployment, Production, 
VBR Traffic, and ECG datasets are respectively: 100%, 96%, 94%, and 78%. 

The ILP system consistently induced models with best forecasting perfor- 
mance on all datasets studied. This allow us to conclude that the proposed 
modifications to the basic ILP search process, makes an ILP system suited for 
time series forecasting and discovering new time series models. 
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Table 1. Summary of results of the Relative R.MSE of the ILP algorithm and other 
benchmark models for the selected datasets 



Model Unemployment Production VBR Traffic ECG 



IndLog MVS 


0.91 


0.85 


0.93 


0.82 


IndLog 


1.11 


1.04 


0.96 


0.93 


AR 


1.04 


0.98 


0.94 


0.97 


SARIMA 


1.00 


1.00 


- 


- 


MSA 


1.19 


- 


- 


- 


MSC 


- 


1.00 


- 


- 


MSMH 


- 


0.98 


- 


- 


MSIAH 


- 


1.20 


- 


- 


SETAR 


- 


1.19 


- 


- 


TAR 


1.00 


- 


1.00 


- 


RBFN 


- 


- 


- 


1.00 


Bivariate AR 


1.20 


- 


- 


- 


Benchmark RMSE 1.59E-1 


4.44E-3 


12.93E3 


4.53 



7 Related Work 

Other approaches to the task of learning numerical relationships may be found in 
the ILP literature. FORS [24] integrates feature construction into linear regres- 
sion modelling. The ILP system IndLog [25] presented mechanisms for coping 
with large number of examples that include noisy numerical data without nega- 
tive examples and the capability to adjust model parameters at induction time. 
Equation discovery systems like LAGRAMGE [10] allow the user to specify the 
space of possible equations using a context-free grammar. TILDE [5] has the 
capability of performing regression-like tasks. 



8 Conclusions 

In this paper we have proposed improvements in the numerical reasoning capa- 
bilities of ILP systems. The improvements proposed are: model validation; model 
selection criteria and; a stopping criterium. 

Our proposals were incorporated in the IndLog [25] ILP system and evalu- 
ated on time series modelling problems. The ILP results were better than other 
statistics-based time series prediction methods. The ILP system discovered a new 
switching model based on the possibility of varying the delay on the activation 
rule of each sub-model of a TAR model. 

The proposals made for model validation, model selection and for measuring 
the learning performance can be generalised to other machine learning techniques 
dealing with numerical reasoning. 
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Abstract. The ICA mixture model was originally proposed to perform 
unsupervised classification of data modelled as a mixture of classes de- 
scribed by linear combinations of independent, non-Gaussian densities. 
Since the original learning algorithm is based on a gradient optimization 
technique, it was noted that its performance is affected by some known 
limitations associated with this kind of approach. In this paper, improve- 
ments based on implementation and modelling aspects are incorporated 
to ICA mixture model aiming to achieve better classification results. 
Comparative experimental results obtained by the enhanced method and 
the original one are presented to show that the proposed modifications 
can significantly improve the classification performance considering ran- 
dom generated data and the well-known iris flower data set. 



1 Introduction 

Classification and grouping of patterns are problems commonly found in many 
branches of science such as biology, medicine, computer vision and artificial intel- 
ligence [7]. In a supervised classification, each pattern in a data set is identified 
as a member of a predefined class. On the other hand, an unsupervised clas- 
sification algorithm assigns each pattern to one class based on statistics only, 
without any knowledge about training classes. 

An approach for unsupervised classification is based on mixture models (see, 
for example, [5]) where the data distribution is modelled as a weighted sum of 
class-conditional densities. For example, in the case of a Gaussian mixture model, 
the data in each class is assumed to have a multivariate Gaussian distribution. 
However, this assumption implies that the Gaussian mixture model exploits only 
second order statistics (mean and covariance) of the observed data to estimate 
the posterior densities. 

In recent years, Independent Component Analysis (ICA) has been widely ap- 
plied in many research fields due the fact that it exploits higher order statistics in 
data [6] [4] [2]. In fact, ICA is a generalization of Principal Component Analysis 
(PCA), in the sense that ICA transforms the original variables into independent 
components instead of just uncorrelated ones, as in the case of PCA. 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 205—214, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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The ICA Mixture Model (ICAMM) was proposed by Lee and coworkers in 
[9], aiming to overcome one limitation of ICA, that is the assumption that the 
sources are independent. In such approach, this assumption was relaxed by using 
the concept of mixture models. 

Each class in ICAMM is described by a linear combination of independent 
sources with non-Gaussian densities. The algorithm finds independent compo- 
nents and the mixing matrix for each class and also computes the class member- 
ship probability for each pattern in the data set. The learning rules for ICAMM 
were derived using gradient ascent to maximize the log-likelihood data function. 

Although some promising features of ICAMM have been reported in [9], 
we observed, in experiments with random generated data and Iris data set, a 
very slow convergence and poor classification results. In two papers found in 
literature [11], [12], ICAMM was also applied without great success. In [11], 
ICAMM was shown to work well on 2D artificial data, but the advantage of 
using such model in image segmentation was not proven conclusively. In [12] some 
feature extraction techniques were considered as preprocessing steps for reducing 
the data dimensionality and increasing the efficiency of ICAMM. Although the 
mean overall classification accuracies obtained by their approach are higher than 
those obtained by the /e-means method, they pointed out several limitations and 
assumptions that compromise the use of ICAMM in remote sensing classification. 

In attempting to improve the ICAMM performance, this work introduces 
the Enhanced ICA Mixture Model (EICAMM), which implements some mod- 
ifications on the original ICAMM algorithm learning rules. As in the original 
ICAMM, some modifications are also based on an information-maximization 
approach proposed by Bell and Sejnowski in [2], In such paper, they introduce 
a new self-organizing algorithm which maximizes the information transferred in 
a network of non-linear units. 

Another problem associated to the ICAMM is related to the fact that its 
learning algorithm is based on a gradient optimization technique. Therefore, it 
was noted that its performance is affected by some known limitations associated 
with this kind of approach. The gradient ascent (descent) algorithm has became 
famous in literature as a standard method of training neural networks [10]. The 
widespread use of such technique is mainly related to its most powerful property: 
it can be mathematically proven that this algorithm will always converge to 
a local minimum in the objective function, although an immense number of 
iterations are often necessary. Beside this point, another important problem to 
be solved is that there is no guarantee that the method will not be stuck in a 
local minimum. 

Aiming to attenuate these problems, additional improvements on the original 
model were made by incorporating some features of methods of nonlinear op- 
timization based on second derivatives of objective functions. In the sense, the 
Levenberg-Marquardt method (see for example [10]), was incorporated to the 
learning algorithm to guarantee and improve the convergence of the model. 

In order to evaluate the efficiency of the proposed modifications, some results 
obtained by the Enhanced ICA Mixture Model (EICAMM) and those obtained 
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by the original one are presented and discussed in a comparative study. From 
this discussion, one can see that the proposed modifications can significantly 
improve the classification performance, considering random generated data and 
the well-known Iris flower data set. 

This paper is organized as follows. In section 2, basis concepts on ICA are 
presented. The original ICA mixture model is presented in section 3. In section 
4, the Enhanced ICA Mixture Model is introduced. Experimental results are 
presented and discussed in Section 5. Finally, some conclusions and future work 
are presented in Section 6. 



2 Independent Component Analysis (ICA) 

The noise-free ICA model can be defined as the following linear model: 

x = As (1) 

where the n observed random variables X\, x 2 , x n are modelled as linear com- 
binations of n random variables Si, s 2 , ..., s n , which are assumed to be statisti- 
cally mutually independent. 

In such model, the independent components Sj cannot be directly observed 
and the mixing coefficients are also assumed to be unknown. Only the random 
variables Xi are observed and both the components Sj and the coefficients ay- 
must be estimated using x. 

The independent components Sj must have non-Gaussian distributions. Kur- 
tosis is commonly used as measure of non-Gaussianity in ICA estimation of 
original sources. Such measure for a random variable y is given by: 

kurt(y) = E{y 4 } - 3 (2) 

A Gaussian variable has a zero kurtosis value. Random variables with positive 
kurtosis have a super-Gaussian distribution and those with negative kurtosis a 
have sub-Gaussian distribution, as illustrated in Figure 1. 




Fig. 1 . Examples of Gaussian, super-Gaussian and sub-Gaussian probability density 
functions. Solid line: Laplacian density. Dashed line: a typical moderately super- 
Gaussian density. Dash-dotted line: a typical strongly super-Gaussian density. Dotted 
line: Gaussian density [6] 

It is also assumed that the unknown mixture matrix A is square. It simplifies 
the estimation process, since after estimating matrix A, it is possible to compute 
its inverse W, and obtain the independent components simply by: 




208 



P.R. Oliveira and R.A.F. Romero 



s = Wx (3) 

Therefore, the goal of ICA is to find a linear transformation W of sensor 
signals x that makes the outputs u as independent as possible 

ut = Wx ( = WAs ( (4) 

so that u in an estimative vector of the sources. 

There are several methods for adapting the mixing mixtures in the ICA model 
[4] , [3] , [6] [8] . In [8] , they use the extended infomax ICA learning rule to blindly 
separate unknown sources with sub-Gaussian and super-Gaussian distributions. 
In this approach, maximizing the log-likelihood of the data X gives the following 
learning rule for W : 



AW = 


£ 

h 

.1— 1 




(5) 


where I is the identity matrix and 








dp( u) 


r dp(“i) 


dp(u N ) 




v{u) = -pM = 


dui 


duN 


(6) 


. p(u l) 


p(u N ) 



A strictly symmetric sub-Gaussian bimodal distribution can be modelled by 
a weighted sum of two Gaussian distributions with mean p and variance a 2 , 
given as: 

P( u ) = ^(N(p,a 2 ) + N(-p,,a 2 )), (7) 

For /x = 1 and a 2 = 1, Equation(6) reduces to: 

y>(u) = u — tanlr(u), (8) 

which leads to the learning rule for strictly sub-Gaussian sources: 

AW a [I + tanh(u)u T - uu T ]W. (9) 

For unimodal super-Gaussian sources, the following density model is adopted: 

p(u) = iV(0, l)seclr 2 u, (10) 

In this case, Equation(6) reduces to: 

y>(u) = u + tanlr(u), (11) 

which leads to the following learning rule for strictly super-Gaussian sources: 

AW oc [I - tanh(u)u T - uu T ]W. (12) 

Therefore, using the Equations (9) and (12), one can obtain a generalized 
learning rule, using the switching criterion for distinguishing between sub-Gaus- 
sian sources and super-Gaussian sources by the sign before the hyperbolic tan- 
gent function as: 



AW oc [I - Ktanh(u)u T - uu T ]W. 



(13) 
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where K is an iV-dimensional diagonal matrix composed of fc’s calculated as: 

hi = sign(kurt(u,)). (14) 

3 The ICA Mixture Model (ICAMM) 

In ICAMM, it is assumed that the data X = {xi, . . .x^} are generated by a 
mixture density model. The likelihood of data is given by the joint density 

T 

p(X I 0) = I 0 )- ( 15 ) 

t = 1 



The mixture density is 



K 

p(x-t I 0 ) = I C kiO k )p{C k ), (16) 

k = 1 

where 0 = (0 1, . . . , 9k) are the unknown parameters for each p(x \ C k , 9k)- In 
this case, Ck denotes the class k and it is assumed that the number of classes k 
is known in advance. The data in each class are described by: 



x* = A k s k + b fc (17) 

where A k is a N x N scalar matrix and is the bias vector for class k. The 
vector Sfc is called the source vector. 

The task is to classify the unlabelled data points and to determine the pa- 
rameters for each class (A*,, b*) and the probability for each data point. 

The iterative learning algorithm which performs gradient ascent on the total 
likelihood of data has the following steps: 



1. Compute the log-likelihood of data for each class: 



logp(x t | C k ,9 k ) = logp(s fc ) - log (| det(A fe ) |) 



In ICAMM, s k = A k 1 (x t — b k ). 

2. Compute the probability for each class given the data vector x t : 



p{C k | x t ,0) 



p{x t | C k ,9 k )p(C k ) 
EfcP( x * I C k ,0 k )p(C k ) 



(18) 



(19) 



3. Adapt the matrices A*, and the bias terms b^ using the updating rules: 



AA fc oc -p(C k | x t ,0)A fe [I - Ktanh(s fc )sfc - s fc s^] (20) 



J2t x tp(C k | x t ,0) 

E t P( c k I *t,9) 



(21) 
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For log-likelihood estimation in Equation 18, logp(sfc) is modelled by: 

N ( s 2 \ 

log p( Sfc) oc - y ^ikj log (cosh s k ,i ) Y J (22) 

i—1 ' ' 

The learning algorithm converges when the change in log-likelihood function 
between two successive iterations is below than a predetermined small constant. 
After the convergence, the classification process is carried by processing each 
instance with the learned parameters and b^. The class-conditional proba- 
bilities p{C k | x t , 9) are computed for all classes the corresponding instance label 
is assigned to the class with highest conditional probability. 



4 Enhanced ICA Mixture Model (EICAMM) 

The Enhanced ICA Mixture Model (EICAMM) proposed in this paper was de- 
rived from some important modifications on ICAMM, considering both mod- 
elling and implementation aspects, which are discussed in this section. 

4.1 Reformulating the Class Model 

Instead of considering the bias term as added to the data after they are generated 
by an ICA model (Equation 17), in EICAMM, the bias vectors are considered 
to be mixed to the signal sources. This modification originates another equation 
to describe the data in each class, as given by: 



x t = A fc (s fc + b fe ) 



(23) 



4.2 Reformulating the Bias Learning Rule 

For a more informative bias learning rule, in EICAMM, the bias updating rule 
is derived as in [2] , where it is formulated by using an approach to maximize the 
mutual information that the output Y of a neural network processor contains 
about its input X. According to the ideas in [2], when a single input x passes 
through a nonlinear transformation function g(x), yields an output variable y , 
in a way that the mutual information between these two variables is maximized, 
aligning high density parts of the probability density function of x with highly 
sloping parts of the function g(x). 

In this sense, the learning rule for weights and biases in such neural network 
are formulated using some nonlinear transfer function. In the formulation of 
EICAMM, the learning rule for the bias term is formulated as in [2] , considering 
the hyperbolic tangent transfer function, as given by: 



Ab fc oc — 2tanh(sfc). 



(24) 
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4.3 An Orthogonalization Problem 

Since from the derived formulation of ICAMM [9], W k = A^ 1 , the Equation 
(13) can be written as: 

AW oc [I - Ktanh(u)u r - uu^A -1 . (25) 

Supposing that A is an orthogonal matrix then, by a propriety of orthogo- 
nality, one have that A -1 = A 1 . Therefore, as one more modification that was 
incorporated to EICAMM, the updating rule of A*, is now given by: 

AA k (xp(C k | x t ,0) A ft (I - Ktanh(s*;)sfe - s fc Sfe) T , (26) 

where s k = A^x t — b fc , considering the model for the data in each class used in 
EICAMM and formulated in Equation (23). Note that that the transpose oper- 
ator is used instead of the inverse matrix A)) , which implies a computational 
advantage of EICAMM in comparison to ICAMM. 

Consequently, in EICAMM, the mixing matrices A k should be ortlrogonalized 
in each iteration using the following equation: 

Afc = A fc (A£A fc ) 1/2 . (27) 

4.4 Incorporating Second Derivative Information 

When the second derivative of the objective function is relatively easy to com- 
pute, it is considered an excellent idea to incorporate this information for speed- 
ing up the convergence and bounding the minimum of the function. Following 
this motivation, a modification in the updating rule for A k has been proposed 
here which incorporates the second derivative of the log-likelihood function. 
In this section, it is presented how the Newton’s method and the Levenberg- 
Marquardt method can be formalized to be used in EICAMM. 

• Newton’s Method: For modelling the Newton’s method for EICAMM, the 
second derivatives of the log-likelihood, which is given by the Hessian matrix H, 
should be incorporated to the learning rule in Equation (20). Consequently, the 
new updating rule is given by: 

AA k <xp(C k | x t ,6»)H _1 A fe (I-Ktanh(s fc )Sfe -s fc s^) T . (28) 

The derivation of matrix H for EICAMM is presented in Appendix A. 

• Levenberg-Marquardt Method: Incorporating the Levenberg-Marquardt 
method to the updating rule for A k , another modification has been also made to 
guarantee that the Hessian matrix H is positive defined, since this is a necessary 
condition for the matrix H to be invertible [1], This modification is given by: 

AA k oc p(C k | x t ,0)(H + /rI) _1 A fe (I-Ktanh(sfc)sj[-SfcSfc) T , (29) 

where p is a small constant in (0,1). 
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In the experiments of this work, the EICAMM uses the Levenberg-Marquardt 
method in its learning rule for A*,, as in Equation (29). 



5 Experimental Results 

To compare the performance of EICAMM to that presented by ICAMM, both 
approaches were used to classify random generated data and the well-known 
Iris data set. In the first case, random data having Laplace distributions was 
generated, varying the number of classes for K — 2 and K = 3 and the data 
dimension for N = 2 and N = 3. In these experiments, each class contains 1000 
examples. 

The iris flower data set [5] contains three classes with four numeric attributes 
of 50 instances each, where each class refers to a type of iris flower. One class 
is linearly separable from the other two, but the other two are not linearly 
separable from each other. The tests for this data set were performed for all the 
three classes in the data set and for only two classes that are linearly separable. 

Tables 1 and 2 provide the random generated data classification results, 
in terms of overall accuracy, for EICAMM and ICAMM, respectively. It can 
be noted that EICAMM has significantly overperformed ICAMM in all cases, 
achieving its best performance for 2D artificial data with two classes. 

The classification results, in terms of overall accuracy, for the Iris flower data 
set and both models are presented in Table 3. Here, EICAMM had also provided 
better results than those obtained by ICAMM. 



Table 1. Classification Results for Random Generated Data - ICAMM 





2 classes 


3 classes 


2 dimensions 


50.00% 


33.00% 


3 dimensions 


50.00% 


33.00% 



Table 2. Classification Results for Random Generated Data - EICAMM 





2 classes 


3 classes 


2 dimensions 


81.85% 


76.65% 


3 dimensions 


79.70% 


55.00% 



Table 3. Classification Results for The Iris Flower Data Set 





ICAMM 


EICAMM 


2 Classes 


54% 


99% 


3 Classes 


25% 


75.76% 
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6 Conclusions 

This work introduced the Enhanced ICA Mixture Model (EICAMM), which 
incorporates some modifications to the original ICAMM algorithm learning rules, 
considering both modelling and implementation aspects. In order to evaluate 
the efficiency of the proposed model, a comparative study shows some results 
obtained by the EICAMM and the by the original one. It was noted that the 
proposed modifications can significantly improve the classification performance, 
considering random generated data and the well-known iris flower data set. 

As future work, additional experimental tests, including image data for seg- 
mentation, will be performed. 
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Appendix A - Derivation of Second Derivative for the 
Log-Likelihood Function 



AW oc [I - K tanh(u)u T - uu T ] W (30) 

Computing the second derivative in relation W of the equation (30) we have: 

9(1 — K tanh(u)u T — uu T ) , 



(I — K tanli(u)ii' — uu J 



dW 



W = 



(I— K tanh(u)u T — uu T )+(- K tanh(u)s T A r 



Ksech 2 ( u ) Asu t — u 



<9u t 

aw 



du 

aw 



u T )W = 



(I — K tanh(u)u T — uu T ) — K tanh(u)u T — Ksech 2 (u)uu T — uAs t W — As f u T W 



Using the result u t = Wx t = WAs f and substituting in this expression, we 
have: 

I — Ktanh(u)u T — uu T — Ktanh(u)u T — Ksec/i 2 (u)uu T — 3uu T = 

I — 2Ktanlr(u)u T — Kseclr 2 (u)uu T — 3uu T 
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Abstract. In this paper we present an efficient solution, based on ac- 
tive and instance-based machine learning, to the problem of analyzing 
galactic spectra, an important problem in modern cosmology. The input 
to the algorithm is the energy flux received from the galaxy; its expected 
output is the set of stellar populations and dust abundances that make 
up the galaxy. Our experiments show very accurate results using both 
noiseless and noisy spectra, and also that a further improvement in ac- 
curacy can be obtained when we incorporate prior knowledge obtained 
from human experts. 

Keywords: active learning, instance-based learning, locally-weighted re- 
gression, prior knowledge. 



1 Introduction 

In astronomy and other scientific disciplines, we are currently facing a massive 
data overload. With the development of new automated telescopes dedicated 
to systematic sky surveys, archives that are several terabytes in size are being 
generated. For example, the Sloan Digital Sky Survey [1], currently underway, 
will provide astronomers with high-quality spectra of several million galaxies. 
A complete analysis of these spectra will undoubtedly provide knowledge and 
insight that can improve our understanding of the evolution of the universe. Such 
analysis is, however, impossible using traditional manual or semimanual means, 
thus automated tools will have to be developed. 

In recent years, astronomers and machine learning researchers have started 
collaborating towards the goal of automating the task of analyzing astronomical 
data. For example, successful approaches for automated galaxy classification 
have been proposed using decision trees [2], neural networks [3,4] and ensembles 
of classifiers [5, 6]. 

For problems that are inherently higlr-dimensional, such as analysis of galac- 
tic spectra, machine learning approaches have been less successful. In high- 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 215-224, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




216 



O. Puentes et al. 



dimensional spaces, learning algorithms face the well-known curse of dimension- 
ality [7], that states that the number of training examples needed to approximate 
a function accurately grows exponentially with the dimensionality of the task. 

In this paper we propose a method that allows to approximate non-linear 
multidimensional functions using a small initial training set and active learning 
to augment this training set as needed, according to the elements of the test set. 
We apply this method to the problem of analysis of galactic spectra, in which 
one seeks to determine, from the spectrum of a galaxy, the individual stellar 
populations that make it up, as well as the abundance of dust in it. Our method 
allows to take advantage of prior domain knowledge, which is used to further 
increase the accuracy of the results obtained. 

The organization of the remainder of this paper is as follows. Section 2 de- 
scribes the problem of analyzing stellar spectra; Section 3 presents our method 
for function approximation, Section 4 presents experimental results and Section 
5 presents conclusions and proposes directions for future work. 

2 Analysis of Galactic Spectra 

Nearly all information about a star is contained in its spectrum, which is a plot of 
flux against wavelength. By analyzing a galaxy spectrum we can derive valuable 
information about its star formation history, as well as other physical parameters 
such as its metal content, mass and shape. The accurate knowledge of these 
parameters is very important for cosmological studies and for the understanding 
of galaxy formation and evolution. Template fitting has been used to carry out 
estimates of the distribution of age and metallicity from spectral data. Although 
this technique achieves good results, it is very expensive in terms of computing 
time and therefore can be applied only to small samples. 

2.1 Modelling Galactic Spectra 

Theoretical studies have shown that a galactic spectrum can be modelled with 
good accuracy as a linear combination of three spectra, corresponding to young, 
medium and old stellar populations, together with a model of the effects of 
interstellar dust in these individual spectra. 

Interstellar dust absorbs energy preferentially at short wavelengths, near the 
blue end of the visible spectrum, while its effects on longer wavelengths, near 
the red end of the spectrum, are small. This effect is called reddening in the 
astronomical literature. 

Let /(A) be the energy flux emitted by a star or group of stars at wavelength 
A. The flux detected by a measuring device is then d( A) = /(A)(l — e _rA ), where 
r is a constant that defines the amount of reddening in the observed spectrum 
and depends on the size and density of the dust particles in the interstellar 
medium. 

A simulated galactic spectrum, g( A), can be built given ci, C2, C3, the relative 
contributions of young, medium and old stellar populations, respectively; their 
reddening parameters r-i , r'2, r'3, and the ages of the populations 01,02,03. 
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3 

g( A) = ^c, : s(aj, A)(l - e~ riX ) 

i= 1 

where g( A) is the energy flux detected at wavelength A and s(cij, A) is the flux 
emitted by a stellar population of age a, at wavelength A. 

Therefore, the task of analyzing an observational galactic spectrum t consists 
of finding the values of cj, C2, C3, rq, r%, r 3 , aq, 02 and 03 that minimize: 

5>(A)- 5 (A)) 2 

A 

Clearly, ci,...,C3 have to be non-negative, and sum up to 1, also, realistic 
values of ri , . . . , r3 are in the narrow range [1 x 10 -5 , 6 x 10~ 4 ] , and using only a few 
discrete values for a\, a 2 and <23 normally suffices for an accurate approximation. 
In particular, for our experiments we consider ai £ {3 x 10 6 }, <22 € {10 ,3 x 
10 s , 5 x 10 s , 8 x 10 8 }, and a 3 € {10 9 , 2 x 10 9 ,3 x 10 9 ,5 x 10 9 ,10 10 }. 

The next subsection describes the method we propose to solve this problem. 

3 The Methods 

In the problem we are trying to solve here, galactic spectral analysis, the algo- 
rithm tries to predict the reddening parameters for three age populations from a 
high dimensional spectrum. To circumvent the curse of dimensionality, we parti- 
tion the problem into three subproblems, each of which is amenable to be solved 
by a different method. The key observation is that if we knew the values of 
the reddening parameters we could just perform a brute-force search over the 
possible combinations of values for the ages of stellar populations (a total of 
1x4x5 = 20) and for each combination of ages find the contributions that 
best fit the observation using least squares. Then the best overall fit would be 
the combination of ages and contributions that resulted in the best match to the 
test spectrum. Thus the crucial subproblem to be solved is that of determining 
the reddening parameters. 

Predicting the reddening parameters from spectra is a difficult non-linear 
optimization problem, specially for the case of noisy spectra. We propose to 
solve it using an iterative active learning algorithm that learns the function from 
spectra to reddening parameters. In each iteration, the algorithm uses its training 
set to build an approximator to predict the reddening parameters of the spectra 
in the test set. Once the algorithm has predicted these parameters, it uses them 
to find the combination of ages and contributions that yield the best match to 
the observed spectra. From these parameters we can generate the corresponding 
spectrum, and compare it with the spectrum under analysis, if they are a close 
match, then the parameters found by the algorithm are correct, if not, we can 
add the newly generated training example (the predicted parameters and their 
corresponding spectrum) to the test set and proceed to a new iteration. Since this 
type of active learning adds to the training set examples that are progressively 
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Table 1 . Pseudocode of our Active Learning Algorithm 



0. Let T be the test spectra 

1-5 = 0 

2. For i = 1 to n 

a. Generate random parameter vector p = [ci, C 2 , C 3 , n, f 2 , ^ 3 , 01 , < 22 , 03 ] 

b. Generate spectra s according to p 

c. S = SU{{{s},{r 1 ,r2,r 3 )}} 

3. While T ^ {} do: 

a. Build C, an ensemble of approximators using LWLR 

b. For every test spectra t € T 

- Use C to predict the reddening parameters ri . r-2- of t 

- For every triple (an, 02 , 03 } G {3 x 10 6 } x { 10 8 , 3 x 10 8 , 5 x 10 s , 8 x 10 8 }x 
{10®, 2 x 10®, 3 x 10®, 5 x 10®, 10 10 } 





s(ai,Ai)(l-e- riAl ),. 


..., s(oi, A m )(l - e riA 


R = 


s(a 2 , Ai)(l — e -r2Al ), . 


..., s(a 2 , A m )(l — e -r2A 




s(o 3 , Ai)(l — e _r3Al ), . 


..., s(a 3 , A m )(l - e~ rsX 


[ci,c 2 


,c 3 ] = t(R T R)- 1 R T 





Generate spectra g according to q = [ci, C 2 , C 3 , n, r 2 , ^ 3 , 01 , 02 , 03 ] 
error(q ) = J2 X (fl(A) - t(A )) 2 

- Let q* = argmin error 

- If error(q*) <threshold 

output ( t,q *) 

T = T — {t} 

- Else S = S U {((s), (ri, r 2 , r 3 )}} 



closer to the points of interest, the errors are guaranteed to decrease in every 
iteration until convergence is attained. 

An outline of the algorithm is given in Table 1. In steps 1 and 2, the algorithm 
constructs an initial training set, randomly generating parameter vectors (the 
target function) and computing their corresponding spectra (the attributes). 
Step 3 is repeated until a satisfactory fit has been found for every spectrum 
in the test set. First an approximator is built to derive reddening parameters 
from spectra using an ensemble and locally- weighted linear regression (LWLR), a 
well-known instance-based learning algorithm. Using this approximator, it tries 
to find the reddening parameters of each spectrum in the test set. Given the 
candidate reddening parameters found, it searches for the combination of ages 
and contributions that yield the minimum squared error, if the error is smaller 
than a threshold, it outputs the set of parameters found for that spectrum and 
removes it from the test set, if the error is not small enough, it adds the new 
training example to the training set and continues. 

It should be pointed out that the active learning algorithm is independent 
of the choice of learning algorithm used to predict the reddening parameters. Any 
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algorithm that is suitable to predict real- valued target functions from real- valued 
attributes could be used. In this work we use an ensemble of locally-weighted 
linear regression (LWLR). For building the ensemble we used a method that 
manipulates the attribute set. Here, each member of the ensemble uses a different 
subset randomly chosen from the attribute set. More information concerning 
ensemble methods, such as boosting and error-correcting output coding, can be 
found in [8]. We choose LWLR as the base learning algorithm because it has 
been applied successfully to several complex problems and, like all instance- 
based learning algorithms, it does not require any training time, thus it is very 
well suited to situations in which the training set is continually modified, as is 
the case with our method. For a description of LWLR we refer the reader to [9]. 

4 Experimental Results 

In all the experiments reported here we use the following procedure: we generated 
randomly a set of galactic spectra with their corresponding parameters. This set 
was randomly divided into two equally sized disjoint subsets, one subset was 
used for training and the other was considered the test set. This procedure was 
repeated 10 times, and we report here the overall average. 

In the first set of experiments our objective was to measure empirically the ad- 
vantage of the active learning procedure versus a traditional ensemble of LWLR. 
As mentioned previously, the ensembles were constructed selecting randomly a 
subset of the attributes. In order to make an objective comparison, both methods 
used the same attribute subset and an ensemble of size 5. Table 2 shows mean 
absolute errors in the prediction of reddening and the relative contributions, 
also we present the root mean squared error between the predicted and the real 
test galactic spectra. We can see that a considerable error reduction is attained 
by our optimization algorithm in the estimation of reddening parameters, and 
that it yields a more accurate computation of the relative contribution of age 
populations. 

4.1 Incorporating Prior Knowledge 

Experimental results presented above show that the active learning procedure is 
a very accurate method in the determination of star formation history in galaxies. 
Those experiments used as attributes the complete energy flux from the galaxy. 
In contrast, an expert astronomer can derive, with very high accuracy, physical 



Table 2. Comparison between an ensemble of LWLR (LWLR- ensemble) and active 
learning (A) 



Algorithm 


M.A.E. Contributions 


M.A.E. Reddening 


R.M.S.E. Flux 


LWLR-ensemble 

A 


0.0821 

0.0773 


2.0583e-004 

1.8030e-004 


8.4188e-007 

6.0688e-007 
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xIO " 4 




3300 3800 4300 4800 (d) 5300 5800 6300 6800 



Fig. 1 . Graphical comparison of results. Figure (a) from top to bottom and shifted by a 
constant to aid visualization: original test spectrum, spectrum recovered using ensemble 
of LWLR and original data, spectrum recovered using original data and active learning, 
and spectrum recovered using active learning and prior knowledge. Figures from (b) to 
(d) show relative difference between test spectrum and predicted spectra in the same 
listed order 



parameters of a star by measuring only certain regions of the spectrum along the 
wavelength. These regions are known in the literature as Lick indices and were 
defined in 1973 by Faber et al. [10]. Although Lick indices were defined when 
model resolution was very low, astronomers have been using them for several 
decades now. We decided to explore in the following experiments whether the 
use of this prior knowledge can help machine learning algorithms to provide a 
more accurate prediction. 

We introduce prior knowledge in the learning task by augmenting the rele- 
vance of the Lick indices when LWLR calculates the Euclidian distance from the 
query point to the training data. What we did was multiply the energy fluxes 
in the wavelengths corresponding to the Lick indices by a constant k = 4. That 
is, fluxes in regions defined by Lick indices were deemed to be 4 times as im- 
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portant as pixels in other regions. This value of k was set experimentally with a 
10-fold cross-validation procedure. Table 3 presents prediction errors using prior 
knowledge. Figures for row PK-LWLR show results of using prior knowledge 
and an ensemble of LWLR, row A+PK presents results of the active learning 
algorithm and prior knowledge. If we compare figures from Table 2 and Table 3 
we can see that even though the active learning algorithm does better than PK- 
LWLR, prior knowledge with active learning , A +PK, yields the best results. In 
Figure 1 we present a graphical comparison among using original data with an 
ensemble of LWLR, active learning using original data and prior knowledge with 
active learning. In figure (a) we present the original test spectrum and the re- 
constructed spectra from each of the three methods. Figures (b) to (d) show the 
relative difference in flux in predicted spectra presented in the same listed order. 
The relative difference in flux from the active learning algorithm that exploits 
prior knowledge is closer to a straight line, while the residuals from using the 
original data and the ensemble of LWLR has some high peaks indicating an error 
of nearly 25% in some spectral regions. 



Table 3. Comparison between using prior knowledge with an ensemble of LWLR ( PK- 
LWLR ) and prior knowledge with active learning (A+PK) 



Algorithm 


M.A.E. Contributions 


M.A.E. Reddening 


R.M.S.E. Flux 


PK-LWLR 

A+PK 


0.0840 

0.0655 


1.9251e-004 

1.3189e-004 


7.5726e-007 

5.0766e-007 



4.2 Noisy Data Experiments 

Results presented above are encouraging. However, the data used in those ex- 
periments was noise free. We are aware that noisy data pose a more realistic 
evaluation of our algorithm, given that in real data analysis problems noise 
is always present. Galactic spectral analysis is no exception to this rule, noise 
can come from the source itself or from a bad calibration of the instruments. 
For this reason, we performed a new set of experiments aimed at exploring the 
noise-sensitivity of our active learning algorithm. We performed the same pro- 
cedure described previously, except that this time we added to the test data a 
gaussian noise with zero mean, standard deviation of one and we use a ratio of 
signal to noise equal to 50. 



Table 4. Comparison between an ensemble of LWLR, active learning and active learn- 
ing with prior knowledge using noisy test data 



Algorithm 


M.A.E. Contributions 


M.A.E. Reddening 


R.M.S.E. Flux 


LWLR ensemble 


0.1145 


2.6589e-004 


1.1732e-006 


A 


0.1058 


2.6533e-004 


8.0577e-007 


A+PK 


0.0880 


2.2244e-004 


8.5577e-007 
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Fig. 2. Graphical comparison of results using noisy data. Figure (a) from top to bot- 
tom and shifted by a constant to aid visualization: original test spectrum, spectrum 
recovered using ensemble of LWLR. and original data, spectrum recovered using origi- 
nal data and active learning, and spectrum recovered using active learning and prior 
knowledge. Figures from (b) to (d) show relative difference between test spectrum and 
predicted spectra in the same listed order 



In order to deal with noisy data we used standard principal component anal- 
ysis. Principal component analysis seeks a set of M orthogonal vectors v and 
their associated eigenvalues k which best describes the distribution of the data. 
This module takes as input the training set, and finds its principal components 
(PCs). The noisy test data are projected onto the space defined by the first 
20 principal components, which were found to account for about 99% of the 
variance in the set, and the magnitudes of these projections are used as at- 
tributes for the algorithm. Table 4 shows results comparing, as before, results 
for an ensemble of LWLR using the original noisy data, active learning using 
original data and prior knowledge with active learning. As expected, when using 
noisy data the best results have higher error rates than the results presented in 
Table 3. However, when using prior knowledge and active learning ( A+PK ), 
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the prediction of reddening parameters is more accurate than with the other 
two methods: LWLR ensemble using the original data and active learning using 
original data (A). Figure 2 presents a graphical comparison of results. 

A relevant difference between noisy data experiments and noiseless ones is 
that the root mean squared error in the energy flux attained with active learning 
and the original data is lower than the one from using prior knowledge with active 
learning. It is somewhat surprising, given that in the other two error comparisons 
the opposite occurs. We consider that this may be due to the influence of giving 
a higher weight to the Lick indices when computing the reddening parameters, 
as well as the pseudoinverse. When the algorithm is given the original data it 
tries to minimize the root mean squared error along the complete energy flux, 
and each difference in A along the spectrum has the same importance in the 
calculation of this error. This implies that it is not very important if in a few 
spectral lines the error is a little high, as long as for the vast majority the 
difference in each A is minimum. In contrast, by using prior knowledge we are 
trying to minimize the error in a few spectral regions that are weighted more 
heavily, while the rest of the spectrum has a rather low weight in the calculation 
of the root mean squared error. 

5 Conclusions 

In this paper we have presented a learning algorithm for optimization that has 
a very strong feature: the ability of extending the training set automatically in 
order to best fit the target function for the test data. There is no need for manual 
intervention, and if new test instances need to be classified the algorithm will 
generate as many training examples as needed. 

We have shown experimental results of the application of our method to solve 
the problem of, given a large set of galactic spectra, finding their corresponding 
star formation history and reddening. The method yields very accurate results, 
specially when using prior knowledge, even in the presence of noise. 

Present and future work includes: 

— Testing the method using real galactic spectra, taken from the Sloan Digital 
Sky Survey. 

— Using models with different metallicities and explore the performance of 
our method when a new parameter, named metallicity, is introduced to the 
problem. 

— Exploring a different method for introducing prior knowledge. 

— Testing the active learning algorithm in other scientific data analysis tasks. 
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Abstract. In this paper, we introduce a new idea to reduce the hard 
time consuming task of finding the set of initial parameter values of an 
evolutionary algorithm that uses the inheritance strategy. The key idea 
is to adapt the parameter values during the execution of the algorithm. 
This adaptation is guided by the degree of difficulty shown by the prob- 
lem, which is being solved, for the algorithm. This strategy has been 
tested using an evolutionary algorithm to solve CSPs, but can easily be 
extended to any evolutionary algorithm which uses inheritance. A set of 
benchmarks showed that the new strategy helps the algorithm to solve 
more problems than the original approach. 



1 Introduction 

Designing genetic algorithms to tackle a complex problem is a time consuming 
task. It requires creating a new algorithm for each specific problem. Roughly 
speaking, we need to define at least the following: 

— a representation, 

— some especially designed operators 

— the “best” set of parameter values or, an associated control strategy 

Parameter control is better than tuning parameters to solve NP-lrard and 
constrained problems, [7]. Many kinds of parameter control techniques have been 
proposed in the literature, however they are usually not cooperative ones. In 
this paper, we propose a way of increasing the cooperation between inheritance 
operators and filtering, using the problem complexity. 

In the approach that uses inheritance, given a pool of operators available to be 
applied, the algorithm itself selects the “most appropriate” operator to be used. 
The offspring not only inherits the variable values from its parents, but it also in- 
herits the operator which has been used to create one of them. Using inheritance 
from the parents behaviour, it is able to discard the operators which are not ap- 
propriate to solve a given problem. The goal is to propose a strategy that can help 
the evolutionary algorithm that uses inheritance to select a good set of operator 
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parameter values. The inheritance algorithm itself is able to change its parameter 
values during the run depending on the state of the search. However, we know that 
the initial set of parameter values are relevant to obtain a better performance. 

The tests presented in this paper use three well known crossover operators 
for inheritance GPX [9], Arc-crossover, Self-arc-crossover [13] especially designed 
for graph coloring problems. 

The motivation of this work is to show that coupling a more refined mecha- 
nism to an evolutionary algorithm that uses inheritance, can improve its perfor- 
mance helping the algorithm to solve a greater number of problems. 

The paper is organized as follows: In the next section we briefly describe the 
inheritance strategy. In section three we introduce the new parameter control 
strategy called “Dynamic Filtering Parameters Values” (DFPV). Section four 
presents the results obtained using both random generated graph coloring prob- 
lems and specialized crossover operators. Finally, in the last section we present 
the conclusions and the future work. 

2 Operators Inheritance Algorithm 

In the previous research we used inheritance to find the most appropriate oper- 
ator of a pool of operators that could potentially generate a better offspring. In 
this approach, the offspring not only inherits the variables values from its par- 
ents, but it also inherits the operator that has been used to create one of them. 
For instance, the inheritance process applied for crossover operators which needs 
two parents to create one child is shown in the figure 1. 

Using this strategy we have concluded in [14] that: 

— Given a pool of operators proposed in the literature in an evolutionary algo- 
rithm, it is able to select the best operator to be applied during the search. 



Begin / * Procedure Operators Inheritance Algorithm */ 

Parent 1 = select (population(t) ) 

Parent 2 = select (population(t) ) 

if both parents have been generated by the same crossover operator 
cross-op then 

Generate the offspring using cross-op 
if only one parent have been generated by a crossover operator 
cross-op then 

Generate the offspring using cross-op 
if the parents have been created by different crossover operators 
cross-op-1, cross-op-2 then 

Generate the offspring using the crossover operator used to 
create the best of the two parents . 

Include the applied crossover operator information on the child 
End /* procedure */ 



Fig. 1 . Operators Inheritance Algorithm 
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— At the end of the algorithm, the worst operators have been discarded, and 
most of the time, the last population is composed of chromosomes generated 
by the best operators. Thus, the algorithm can identify which operators are 
not suitable for this problem during the search. 

2.1 Specialized Crossover Operators for 3-Graph Coloring 

Special attention is given to the constrained problems in the evolutionary com- 
munity, [3] because they are especially hard for evolutionary algorithms. For the 
tests we have selected the 3-graplr coloring problem which is an NP-hard con- 
straint satisfaction problem. Evolutionary Algorithms for constrained problems 
can be divided into two classes: EAs using adaptive fitness functions and EAs 
using heuristics, [8], [12], [13]. We consider here three very well known heuristics 
based operators implemented in evolutionary algorithms for graph coloring 

1. Greedy Partition Crossover Operator (GPX): GPX has been especially de- 
signed for the graph coloring problems and proposed in [9]. We use this 
operator because its results equal, and sometimes exceed those of the best 
well known algorithm for the graph coloring problems. A very important 
remark about GPX is that the edges of the graph, that is the constraints, 
are not involved in this crossover operator. 

2. Arc-crossover Operator: We introduced in [13], [11] an EA for solving CSPs 
which uses information about the constraint network in the fitness function 
and in the genetic operators. The fitness function is based on the notion 
of error evaluation of a constraint. The error evaluation of a constraint is 
the number of variables of the constraint and the number of variables that 
are connected to these variables in the CSP graph. The crossover operator 
randomly selects two parents and builds a child by means of an iterative pro- 
cedure over all the constraints of the CSP. Constraints are ordered according 
to their error-evaluation with respect to the instantiations of the variables 
that violate the constraints. 

3. Constraint Dynamic Adapting Crossover: It uses the same idea of arc- 
crossover, that is there are no fixed points to make crossover. This oper- 
ator uses a dynamic constraints priority. The dynamic constraint priority 
does not only take into account the network structure, but also the current 
values of the parents. The whole process is introduced, in detail, in [13]. 

2.2 Inheritance for 3-Graph Coloring Problems 

Now, analysing in more detail the technique based on inheritance of the be- 
haviour of the parents, we can observe that the parameter values of each operator 
changes during the search using the same initial and fixed probability. This prob- 
ability is not changed in an explicit way. However, the crossover operator which 
has more probability to be chosen is the operator that has produced more indi- 
viduals in the current population. For instance, if we decide to apply a crossover 
operator, the individual selection mechanism will choose two individuals from 
the current population. The operator that is more present in the population has 
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Fig. 2. Number of Operators Selection 



more probability to be applied according to its fitness values. Thus, this tech- 
nique can also be understood as a self-adaptive parameter control mechanism, 
according to the Michalewicz et al. classification in [7] This idea is illustrated in 
the figure 2. It shows that in the beginning the operators are used almost with 
the same frequency, but when the iteration number increases, the selection of the 
operators drastically reduces the use of some operators. In this case, GPX is no 
more useful after 10 iterations, that means that its probability to be applied after 
these 10 generations is zero, because there are no more individuals in the current 
population that have been created by GPX. Thus the algorithm has automati- 
cally removed this operator and continues its execution using the others ones. 



3 Dynamic Filtering Parameters Values (DFPV) 

The goal is to propose a strategy that can help the evolutionary algorithm which 
uses inheritance to select a good initial set of operators parameters values. The 
inheritance algorithm itself is able to change its parameter values during the 
execution depending on the state of the search. However, we have observed that 
the initial set of parameter values are relevant to obtain a better performance. 
Thus, the new strategy called Dynamic Filtering Parameters Values, DFPV, 
focuses on giving a good set of initial parameters values for the inheritance 
algorithm, thinking that it could change the parameters values during the search. 
Before introducing the strategy we need some preliminary definitions: 

Definition 1. (Set of Parameters Values) 

Given a set of j operators belonging to an evolutionary algorithm, we define a Set 
of Parameters Values S p as the set S p = {Po, Pi, ... , P„}, such that each Pi = 
{pn, ■ ■ ■ ,Pij}, where Pij is the probablity value of the jth operator in the Pi set. 
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Definition 2. (Set of Problems Solved) 

Given a S p , we identify a Set of Problems Solved by using the set Pi, S so i ve d(Pi), 
as all the problems solved by the algorithm using these operators parameters val- 
ues. We define the Set of Problems Solved by using the set S p as the S 30 i ve d(S p ) = 
Union{S S ol V ed(P'\ ) • - • - * S so l ve d^Pn) } • 

Definition 3. (Filter Set of Parameter Values) 

Given S p and S 30 ived(S p ), we define a Filter Set of Parameter Values, Q, where 
Q = MINSET(S P ) = {Qi, . . .,Qk} such that S so i ved (Q) = S so i ve d(S p ). 

Remark 1. In the worst case Q = S p , or in a equivalent way k = n. 

In order to obtain the set S so ived(S p ), the set of the problems that uses 
the parameters values in S p allows the algorithm to find their solution. The 
algorithm runs for each P, but for a small fixed number of iterations. The set 
Q represents the minimal set of S p required to solve the same set of problems. 
It is possible that with both Pj and P \ the algorithm solves the same set of 
problems, thus only one of them is required to belong to Q. Finally, we define a 
hierarchy of the set Q whose elements belong to Q but they are ordered beginning 
with Ffi = MAXSET(S so i ve d(Qi)), Vi = 1 That means, that the first 

in the hierarchy will be the set of the parameters values in Q which has solved 
most of the problems, and so on. Let PR be the initial set of problems, PR = 
{problemi, ...,problem x } . 

Finally, the inheritance algorithm begins its execution using Ff i and continues 
with Ffi, H 3 until H k, in the worst case. We now use the previous hierarchy to try 



Begin / * Procedure Construct Hierarchy */ 

E = PR 

For each P, in S p do 

Execute for a small number of iterations Algorithm(E, Pi) 
E = E - Pi 
endfor 

S s olved(Sp ) — U nion{S ao lved(Pl), ■ ■ ■ , S solved(Pn)} 

Identify the Set Q such that S so lved{Q ) = S so lved(Sp) 

H = ordered(Q) 



Fig. 3. Procedure Construct Hierarchy 



Begin /* Procedure DFPV */ 

E = PR 

For h in H do 

Execute Algorithm (E, h) 

S = Union{ S, solved(h)} 
E = PR - S 

if E is empty then STOP 
endfor 



Fig. 4. Procedure DFPV 
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to solve the entire set of problems with a high number of iterations. Therefore, 
the number of problems we try to solve for each value decreases, or stays the 
same if none of them have been solved by the previous value. 

Let PR be the initial set of problems, PR = {problemi, ..., problem?} and S 
the set of already solved problems. 

At the end of the hierarchy, some problems from the original set may not 
have been solved (E is not empty). 

4 Tests 

The goal of the following benchmarks is to evaluate the effect of including hierar- 
chy to a dynamic operator selection into the constraints-graplr based evolution- 
ary algorithm, and to compare it with the original inheritance approach. The 
algorithm has been tested with randomly generated 3-coloring graphs, subject 
to the constraint that adjacent nodes must be colored differently. We used the 
Joe Culberson library, [6] to generate the random graphs. We have tested both 
algorithms for 3-coloring problems with solution, which are related to a sparse 
graph. These kinds of graphs are known to be the most difficult to solve [5]. 
For each number of constraints we have generated 500 random 3-coloring graph 
problems. In order to discard the “easy problems” we have applied DSATUR. 
[2] to solve them. Then, we have selected the problems not solved by DSATUR. 
DSATUR is one of the best algorithms to solve this kind of problems. It is im- 
portant to note that it is easy to find problems not solved by DSATUR in the 
hard zone [5], which is not the case with other connectivities. 

4.1 Hardware 

The hardware platform for the experiments was a PC Pentium IV, 1.4Ghz with 
512 MB RAM under the LINUX operating system. The algorithm has been 
implemented in C. 

4.2 Results 

To run the preliminary step, we have chosen 200 iterations, and an initial set 
of parameters values for the crossover probability of{0,0.1,...,l} used with our 
original inheritance algorithm. 

We tried to solve each graph for 90, 120, 150, 180, 210, 240 and 360 constraints 
from the easiest one (90 constraints) to the hardest one (360 constraints). Note 
that this step, needed to compute Dynamic Filtering Parameter Value hierar- 
chies only requires a few seconds to complete, which is not very time consuming. 
The number of 200 iterations is an arbitrary number chosen to be sufficient to 
be able to find the parameter values which give good results (in a number of 
solved problems). The preliminary step has found different hierarchies for each 
number of constraints. Then we run the same graphs with the Dynamic Filtering 
Parameter Value (DFPV) technique using the previously found hierarchies. To 
show the DFPV efficiency, we run the algorithm with the same number of 200 
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iterations, which is rather few iterations to solve NP-Hard problems. The figure 5 
shows the number of problems solved by the traditional algorithm compared to 
the number solved by the Dynamic Filtering Parameter Value technique. We can 
see that DFPV can find many more solutions than the traditional inheritance al- 
gorithm that uses the initial best parameter values for all the constraints. DFPV 
is able to solve 35% of the hardest problems with 360 constraints, which is, on 
average, 6 times better than the usual algorithm. Moreover, with a little num- 
ber of iterations, the number of unsolved problems by the traditional algorithm 
linearily decreases when at the same time the DFPV maintains a significantly 



DFPV Improvement Factor, 5000 iterations 




Fig. 6. DFPV Factor 
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slower decreasing curve (especially between 90 and 210 constraints). This means 
that DFPV shows a very good efficiency for this constraints range. 

The figure 6 gives the improvement factor of the DFPV with 5000 iterations 
with the hierarchy found using 200 iterations. A higher number of iterations 
really helps the traditional algorithm to solve more problems. Therefore, the dif- 
ference in terms of solved problems between the two algorithms becomes smaller. 
But, the harder the problem is (the higher the number of constraints), the more 
efficient the DFPV is. We see that for 240 constraints, the DFPV solves twice 
as many more problems than the best parameter value for the traditional in- 
heritance approach. The figure 7 shows the number of problems solved by both 
algorithms. It is clear that the delta between them quickly increases when the 
number of constraints becomes higher (harder problems) . 

In the figure 8, we can see that the traditional inheritance algorithm benefits 
from a higher iterations number. Its performance can be up to 900% better 
for 240 constraints with 5000 iterations than with 200 iterations. On the other 
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hand, DFPV does not benefits as much as the non hierarchical approach of a high 
number of iterations. For 240 constraints, the gain is only 39% which is really 
much smaller. DFPV is able to solve 60% more of problems with 200 iterations 
than the traditional algorithm can do with 5000 iterations. Thus, to obtain 
much better results, in terms of solved problems, DFPV only requires hundreds 
of iterations, where the traditional approach needs thousands of iterations. 

5 Conclusion 

Our new approach with the Dynamic Filtering Parameter Value technique has 
shown very good results in terms of the number of problems solved, compared 
to the traditional inheritance method which begins with the best single value for 
a given parameter, previously determined by tuning. Coupled to the inheritance 
mechanism, the DFPV becomes more and more efficient when the problem be- 
comes harder to solve (up to twice better for our 3-coloring graph problem with 
240 constraints). The preliminary step required to find the parameter hierarchy 
which is going to be used for a given problem type is a very low time consuming 
task, as it only needs to run about 200 to 300 iterations. Moreover, the inheri- 
tance mechanism significantly reduces the number of iterations to find a solution 
for the hardest problems, as useless operators can often be guessed very quickly. 
DFPV also requires much less iterations than the original inheritance algorithm 
to solve a given problem than the traditional algorithm, and it gives much better 
results. As a consequence, DFPV requires less CPU time to find better results, 
which is very welcomed to solve large NP-Hard problems. 

5.1 Future Work 

For a given type of problem, it seems to be very interesting to be able to automat- 
ically find the smallest number of iterations required to find the best hierarchy to 
be used by DFPV. For example, 150 iterations would have produced a hierarchy 
as good as 200 iterations have done? Or just a few iterations more (210, 220,...) 
would have dramatically improved the number of solved problems? It would also 
be very interesting to have a parallel version of DFPV, as much of the code can 
be parallelized to further reduce the CPU time needed to solve the problems. We 
should also investigate the relation between the DFPV and the inheritance, to 
be able to have more cooperation between the two mechanisms. This could al- 
low having a different order in the hierarchy used in DFPV to reduce the time 
required to find a maximum number of solutions. 
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Abstract. Recommender systems seek to furnish personalized suggestions 
automatically based on user preferences. These systems use information 
filtering techniques to recommend new items which has been classified 
according to one of the three approaches: Content Based Filtering, 
Collaborative Filtering or hybrid filtering methods. This paper presents a new 
hybrid filtering approach getting the better qualities of the kNN Collaborative 
Filtering method with the content filtering one based on Modal Symbolic Data. 
The main idea is comparing modal symbolic descriptions of users profiles in 
order to compute the neighborhood of some user in the Collaborative Filtering 
algorithm. This new approach outperforms, concerning the Find Good Items 
task measured by half-life utility metric, other three systems: content filtering 
based on Modal Symbolic Data, kNN Collaborative Filtering based on Pearson 
Correlation and hybrid Content-Boosted Collaborative approach. 



1 Introduction 

Recommender systems allow E-commerce websites to suggest products to their 
costumers, providing relevant information to help them in shopping tasks. 
Additionally, most often these sort of systems have increasing their importance in 
entertainment domains [12]. For instance, some interesting features are personalized 
TV guides in digital televisions and music recommendation in on-line stations. 

Independently of the domain, two recommendation tasks have been mainly used by 
information systems in order to minimize information overload problems to their 
users [7,12]. The first one is the presentation of a list of items ranked according its 
relevance to an active user. The second task is the item relevance estimation based on 
active user needs. According to [7], the former, known by Fing Good Items, is the 
core recommendation task recurring in a wide variety of commercial and research 
systems. Furthermore, in commercial system the score of each item presented in a 
recommendation list is not usually shown. Observe that a small score for some item 
may inhibit user to buy it. 

Whatever is the recommender task, in order to execute it, recommender systems 
must acquire as much information as possible about user preferences. This 
information is generally stored in some machine understandable format called user 
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profile. A relevant problem remains in this process: the user has not enough time to 
giving information about him/her. So, it is necessary to learn user preferences with a 
piece of information provided by him/her, possibly in the first system usage, and 
improving this learning in future user interactions. 

The next step is filtering in relevant information in order to present it to the user 
through his/her profile previously acquired. The proposed solutions for this subject 
can be classified in two main groups concerning the kind of filtering approach, e. g., 
Content Based (CB) Filtering (which is based on the correlation between the user 
profile and items content) or Collaborative Filtering (which is based on the users 
profiles correlation). These techniques have inherent limitations, such as impossibility 
to codify some information in the first approach [1] and latency (or cold-start 
problem) in the second one [4,9,11]. Therefore, several works [1,4,9,10,11] have 
exploiting hybrid recommenders to overcome the drawbacks of each. 

In this paper we present a new hybrid filtering approach mixing the better qualities 
of the kNN Collaborative Filtering (kNN-CF) [6] with the content filtering based on 
Modal Symbolic (MS) Data [5]. In our method the user profile is modeled by MS 
descriptions derived from the content of items previously evaluated by the user. This 
becomes very useful to compute the neighborhood of some active user, mainly in the 
beginning of user interactions with the system when there is little information about 
him/her. We show our method requires less information about the user to provide 
better accuracy recommendation lists as those generated with some algorithms. 

This new approach is evaluated by comparing it with the pure content filtering 
based on MS Data [5], with the pure kNN-CF based on Pearson correlation [6] and 
with the Content Boosted Collaborative Filtering ( CBCF) hybrid method described in 
[9]. This comparison is performed in the movie recommendation domain, where the 
user profile is formed by way of a list of items that the user either preferred or 
disliked in the past, along with their respective grades. 

The description of each item is a row of a data table the columns of which are 
classical or symbolic variables. Each cell of this data table may have a single category 
(e.g., the director and year attributes) or a set of categories (e.g., the genre and cast 
attributes). Table 1 shows an example of an item (movie) description. 



Table 1 . Content description of a movie evaluated by some user 



Variable 


Variable type 


Description 


Director 


Single valued qualitative nominal 


Steven Spielberg 


Cast 


Multi-valued qualitative nominal 


[Tom Hanks, David Morse} 


Grade 


Single valued qualitative ordinal 


5 



2 Related Work 

The majority of the works related to hybrid information filtering (IF) methods may be 
classified in one of the following three categories, where each category has a 
particular strategy to combine a collaborative algorithm with a CB one. 
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The first approach is to weight the predictions computed by both IF algorithms 
with a suitable function in order to compute a final prediction. For instance, this 
idea has been firstly proposed by [4], Also, in [10] it is proposed a unified 
probabilistic framework for merging collaborative and CB recommendations. 

Another strategy is estimate the score of some unknown items in the repository 
for a subset of users through a CB algorithm and following this step, use a 
Collaborative Filtering (CF) algorithm to predict new items to some active user. 
The GroupLens research system [11] deployed some automated agents who 
simulate human behavior and calculate scores for new items in repository 
diminishing the cold-start problem. Additionally, in [9] the CB algorithm is used to 
convert a sparse ratings matrix into a full ratings matrix; and then uses CF to 
provide recommendations. So, this technique minimizes the sparsity problem and 
achieves good results in some contexts. 

The third category is characterized by algorithms that build and maintain a user 
profile based on the content description of items previously evaluated by the user. 
These content profiles allow measuring the correlation between users, which is 
important to neighborhood definition in collaborative recommendations. For 
instance, in Fab [1] the user profile is based on the content of web pages previously 
evaluated. 

In the next section, we present a new hybrid method classified in this last 
category. The main idea is to build and maintain the user profile with MS data using 
techniques such as defined in [5], which proposes a CB filtering method based on 
Symbolic Data Analysis (SDA) field. SDA provides suitable tools for managing 
aggregated data described by multi-valued variables, where data table entries are 
sets of categories, intervals, or probability distributions (for more details about SDA 
see the site http://www.jsda.unina2.it/). 

3 Collaborative Filtering Based on Modal Symbolic User Profiles 

The following steps are executed to generate recommendation lists in the CF 
algorithm based on MS user profiles: 

1. Construction of the modal symbolic descriptions of the user profile. This step can 
be done incrementally without degrading the memory usage. 

2. Weight all users based on their similarity with the active user. Similarity between 
users is measured by a suitable function which compares the MS descriptions of 
each user profile. 

3. Select the k closest users as neighbors of active user. The closeness is defined by 
similarity between some candidate neighbor and the active user. 

4. Generation of a ranked list of items after computing predictions from a weighted 
combination of the selected neighbors’ ratings. 

Although, the steps 2-4 are standard in CF algorithms, the 2 nd one is done in a CB 
way through the MS user profiles built in I st step. Before detailing all phases of our 
algorithm we need to introduce some issues related to SDA field. 
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3.1 Symbolic Data 

Data analysis has a data table as input where the rows are the item descriptions and 
the columns are the variables. A single cell of this data table contains a single 
quantitative or categorical value. Often in the real world information is too complex 
for common data-type to describe. This is why different kinds of symbolic variables 
and symbolic data have been introduced [2], For example, a categorical multi- 
valued variable for an item takes a subset of its domain as value. 

In this paper we are concerned with MS data. Formally, let Dj be a finite set of 
categories. A modal variable y, with domain D, defined in the set £ = {a, /?,... } of 
objects is a multi-state variable where, for each object a e E, not only is a subset of 
its domain D t given, but also for each category m of this subset, a weight \v(m) is 
given that indicates how relevant m is for a. Formally, y/a) = ( S/a ), q/a)) where 
q/a) is a weight distribution defined in S/a) c D, such that a weight w(m) 
corresponds to each category m e S/a). Sj(a) is the support of the measure q/a) in 
the domain Dj. In our approach, a symbolic description of an item is a vector where 
there is a weight distribution in each component given by a MS variable. 

In order to exemplify the interest of these kind of data, let us consider the movie 
domain. When recording movies information concerning a user profile g, we come 
across an example where the variable “Cast” (represented here as yeast) allows no 
single-valued answer, since the user has seen many films in the past. Therefore, it 
would be appropriate to record this variable as a MS variable: 

ycas/g) = ({ Tom Hanks, David Morse, Demi Moore}, {0.30, 0.20, 0.50}) 

which reads: with frequency of 30% Tom Hanks, with 20% David Morse, with 50% 
Demi Moore. In this example, S Cas t(g ) = {Tom Hanks, David Morse, Demi Moore} 
is the support of the measure qcastig ) = {0.30, 0.20, 0.50}. 

3.2 Building the Modal Symbolic User Profile 

The subject of this step is constructing a suitable representation of the user profile 
following the algorithm described in [5] with a few adaptations. So, in [5] it was 
proposed each user profile must be represented by a set of MS descriptions which 
synthesize the whole information given by the item descriptions belonging to the 
user profile, taking into account the user evaluation (grade) of each item of his 
profile. 

The construction of the MS descriptions of the user profile involves two steps: pre- 
processing and generalization. The general idea is to build a MS description for each 
item evaluated by the user (pre-processing) and then aggregate these MS descriptions 
in some MS descriptions where each one represents a particular user interest 
(generalization). 

Pre-processing. This step aims to associate with each item a MS description. It is 
necessary for both constructing the set of MS descriptions used to represent the user 
profile and comparing the user profile with a new item (in CB filtering) or with 
another user profile (important to step 2 of our recommendation algorithm). 
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Let Xi = {Xi, ... , Xf , C(i)) be the description of an item i (i=l, n), where Xj cz Dj 
(j=l, .... p) is a subset of categories of the domain I), of the variable y, and C(i) e I) = 
{1, 5} indicates the user evaluation (grade) for this item. For each category m e 

X{, we can associate the following weight: 



w(m ) 




( 1 ) 



where \Xj\ is the number of elements belonging to Xj (its cardinality). Then, the MS 
description of item i is x i =(Xl,...,X j p , where Xj = X j(i) = (S j(i)) 

and X j is a MS variable. S/i) = X / is the support of the weighted distribution qfi). 

The MS description of the item of Table 1 is displayed in Table 2. In this example, 
Scasi(i ) = (Tom Hanks, David Morse} and qc„J 0 = (0.5, 0.5}. Notice that grade 
variable is not pre-processed. It is used to evaluate the item, not describe it. 



Table 2. Content description of a movie evaluated by some user 



Variable 


Description 


Director 

Cast 


({Steven Spielberg}, {1.0}) 

({Tom Hanks, David Morse}, {0.5, 0.5}) 


Grade 


5 



Generalization. After pre-processing the items evaluated by the user, we are able to 
construct the symbolic descriptions of his/her profile. In our approach, each user 
profile is formed by a set of sub-profiles. Each sub-profile is modeled by a MS 
description that summarizes the entire body of information taken from the set of items 
the user has evaluated with the same grade. 

Formally, let u g be the sub-profile of user u which is formed by the set of items 
which have been evaluated with grade g. Let y u = (Tj , . . . , Yf ) be the MS 
description of the sub-profile n„, where Y ' = (S j (u„),q ! (uf )) , witli S:{u „ ) being 
the support of the weighted distribution ( u g ), j = 1, • • • > P • 

If =(X: X. p ,C(i)), where X/ =(S j (i),q j (i)) (j=l,...,p), is the MS 

description of the item i belonging to u g , the support (u g ) of q g ( u g ) is defined as 

Sj(u g )=\JSj(i) (2) 



Let me Sj(u g ) be a category belonging to D r Then, the weight W(m)e qj(u g ) 
of the category m is defined as 



W{m) 



— 2 S(i,m) 

Ug ieug 



S(i,m) = 



w(m)e if me Sj(i ) 

0 , otherwise 



( 3 ) 



where I u g \ is the number of elements belonging to the set u„. 
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3.3 Comparing Modal Symbolic Profiles 



In this step we describe a suitable function that measures the similarity between each 
MS description of user profiles. The hypothesis behind this function is that two users 
have similar preferences if each pair of their sub-profiles evaluated with same grade 
are similar. We use this function to define the neighborhood of some active user. 

Let y u = (Yj , . . . , Yf ) be the MS description of the sub-profile u g of some active 
user. Also, let y v = (K 1 , . . . , Y ’’ ) be the MS description of the sub-profile v g of some 
candidate neighbor for active user. The comparison between the active user u and the 
candidate neighbor v is achieved by the following similarity function: 






*S{ 1 , 2 , 3 , 4 , 5 ) 



(4) 



where the function </>(y u ,y v ),ge {1,2, 3, 4, 5}, has two components: a context free 
component, which compares the sets Sj(u ) and ,S’ ( ( v g ) ; and a context depend 
component, which compares the weight distributions g .(« ) and q ■ (v g ) . The 
dissimilarity function (j) compares two MS descriptions variable-wise first by taking 
position and content differences into account. Next, it aggregates the partial 
comparison results. Other dissimilarity functions suitable to the comparison of MS 
descriptions are presented in [2], but none of them take into account differences in 
position that arise when the feature type is ordered. This function is defined as: 

. y Vg ) = — I k f (S j (u ), S . (v )) + <p ci (q j (u ), q • (v ))] ( 5 ) 

P j=i 



where measures the difference in position in cases where sets Sj(u ) and Sj(v g ) 
are ordered; and <j) cd measures the difference in content between y u and y v . 

Table 3 expresses the agreement (a and p) and disagreement (y and 8) between the 
weight distributions qj(u g ) and qj(v g ). 



Table 3. Comparison between the weight distributions qj(u g ) and qj(v g ) 



User u g 





+ (Agreement) 


- (Disagreement) 


+ 


a ~ ^m£Sj{u g Wj(v g ) W ^ m ^ * P ~ YjmeS j(Ug)r\Sj(v g )W ( m ) 


r-X^sjfuginsjOg)^™) 


- 


3 - 





The context dependent component (p cd is defined as: 



y+S 
a+ y+S 



+ r * s } 
P+y+d ) 



0cd(<lj( U g')’<lj(. V g )) 



(6) 
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If the domain Dj of the categorical variable y,- is ordered, let m L = mini Sj ( u „ ) ), mu 
= max(S j(u g )), c L = min(Sj(v g )) and c v = max(S j(v g )). The join [8] Sj(u g ) © 
Sj (v ) is defined as: 



Sj(u g )®Sj(v g ) 



( Sj (u g ) U Sj (v ) , if the domain D . is non ordered 
{ min( m L ,c L ), max( m u , c v ) } , otherwise 



(7) 



The context dependent component 4 / is defined as: 






o, ifs/« g )ns/v g )?i0 



S ] (u g )®S ] {v g ) 


- 


Sj( u g ) 


- 






S j (u g )®S j {v g ) 





otherwise 



( 8 ) 



Now that we are able to compute the similarity between the active user u with each 
user in the database, we can do the 3 rd step in a straightforward manner. 



3.4 Generating a Ranked List of Items 



The subject of this step is generating a ranked list of items according to user needs. In 
order to achieve this goal we need to compute predictions for each unknown item in 
the repository using the neighborhood found in step 3. Therefore, consider the 
following function p that measures the relevance of some item i to some active user u: 



p(u,i ) 



- | YL{r r j-r v )*V'(u,v) 



(9) 



where k is the neighborhood size. At this point, we can sort the list of items according 
the values produced by equation 9 and present this ranked list to the user. 



4 Experimental Evaluation 

As described in section 1, our case study is movie recommendation domain. We use 
the Movielens 1 dataset joined with a content database crawled from IMDB 2 to 
perform experimental tests. This prepared dataset contains 91190 ratings in the 
interval of 1 (the worst grade) to 5 (the best grade) of 943 different users for 1466 
movies. In order to setup our environment we must answer three questions: 

1. How much the system knows about some user to provide recommendation? 

2. What is the recommendation task we would like to evaluate? 

3. How can we evaluate this task and compare different system performances? 



1 http://movielens.umn.edu 

2 http://www.imdb.com 
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Concerning the first question, in this paper we are interesting in evaluate the 
utility of a recommender system in the beginning of user interactions. In this 
scenario, the user has not provided a lot of information about himself/herself, first 
because there was no much time to do this, and second because it is not a good 
strategy of information systems ask lots of questions to the user whereas he/she 
would leave the system and never come back. Therefore, we think it is reasonably 
user evaluate about 5 or even 10 items in the first contact. 

About the second question, our system is asked to furnish some ordered lists 
( Find Good Items task). This decision was motivated by the hypothesis that in an E- 
commerce environment this task is more useful than other available tasks of 
recommender systems [7,12]. 

Concerning our third question above, according to [7], an adequate rank accuracy 
metric was proposed by [3] called half-life utility metric. This evaluation metric was 
specially designed for tasks such as Find Good Items. The most important 
advantage of such metric is it measures the utility of a sorted list taking into account 
the user generally observes the first results of this list. Then, it assumes the 
hypothesis that each successive item on a list is less likely to be viewed by the user 
according to an exponential decay. Fundamentally, the utility of a sorted list for a 
user is based on the probability of the item having been seen by the user, multiplied 
by the item utility itself. See [3, 7] for additional details of half-life utility metric. 

At now we can describe the methodology used to perform our experiments. First, 
we select all users that have been evaluated at least 100 items of 1466 movies 
available. This users were used in test set to perform four different experiments 
concerning the number /;z={5,10] of items provided in training set for each user and 
the neighborhood’s size /:= { 30,50 } used in collaborative and hybrid filtering 
algorithms. Additionally, it was running an adapted version of the standard 10 fold 
cross-validation stratified methodology [13 (pages 125:127)]. This adaptation 
consist in arrange the training set and test set, respectively, in the proportion of 1 to 
9, instead of 9 to 1 as done in the standard schema. This will be compatible with the 
fact that the user does not furnish enough information in his/her first contact with 
the system. The following algorithms were executed in our tests: 

1. (MSA) - Content-Based Information Filtering based on MS Data; 

2. (CFA) - kNN-CF based on Pearson Correlation; 

3. (CBCF) - Content-Boosted Collaborative Filtering using as CB predictor the 

MSA; 

4. (CMSA) - Collaborative Filtering based on Modal Symbolic User Profiles. 

In Table 4 we can see the average (//) and standard deviations (<T) of half-life 
utility metric for all algorithms grouped by A'= { 30,50 } and m={5,10}. Also, this 
table presents the observed t-statistic values [13 (pages 129:132)] concerning the 
two-tailed test based on 9 degrees of freedom between average behavior of the 
CMSA A= 5 0>m= 5 method and the average behavior of the others methods (MSA, CFA 
and CBCF). 
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Table 4. Results of experiments grouped by k (size of neighborhood) and m (number of items 
in user profile) according to half-life utility metric 





MSA 


CFA 


CBCF 


CMSA 


B 






cr 


t 




a 


t 




a 


t 




a 


30 




34,34 


0,78 


47,28 


40,21 


4,32 


11,91 


20,51 


7,30 


16,16 


44,67 


11,87 


EB 


31,84 


1,11 


65,70 


58,09 


1,99 


6,32 


42,14 


10,22 


6,54 


53,49 


9,71 


50 




34,34 




47,28 


40,21 


BE! 


11,91 


20,51 


7,30 


16,16 


63,87 


2,21 


E 


31,84 


i,n 


65,70 


58,09 


1,99 


6,32 


42,14 


10,22 


6,54 


63,57 


2,00 



As we hope, the MSA algorithm has the same performance independently of k 
value. As well, we can see this happens to CFA and CBCF algorithms. In fact, as 
pointed out by [6] the quality of recommendations performed by CF systems does not 
improve significantly when the number of neighbors is higher than 30. 

On the other hand, the value of 50 for k is determinant to improve the quality in 
CMSA system. Moreover, it arises that having 50 neighbors and just 5 items in user 
profile is sufficient to CSMA achieve the best recommendation lists than each other 
algorithm considered in our analysis (with a confidence level of 0.1%). It is important 
to say that in real systems it is not a problem reach 50 neighbors since this systems 
easily keep thousands of users. Nevertheless, having good recommendations with just 
5 items can help systems accomplish loyal customers and get new ones, mainly 
because it is not necessary acquire too much information about user in first meetings. 

Our method is able to learn more about user when there is little information about 
him/her since it uses content information (and grades) in order to find the user’s 
neighbors which is richer than using just the items’ identifiers (and grades) as in CFA. 

Another interesting result is that the observed standard deviation of the CFA and 
CMSA diminishes when the size of user profile is increased to 10. The reason for this 
behavior is that as more items are added in the user profile better will be the 
estimation of user’s neighborhood and, consequently, better recommendations can be 
provided by the system to some users whose the profile was obscure when there was 
just 5 items. But, a more remarkable result is that CSMA fc50 reach low standard 
deviations implying more stable systems even in the presence of unusual users. 

Notice that the CBCF system has the worst performance when there is 5 items in 
the user profile. This happens because the CBCF has too little information to estimate 
grades of all other items in database for some active user, which is a step of this 
algorithm. See that when there is more information about the user (10 items in user 
profile) the performance has been almost twice than in previous case. 

As a final remark, it is not surprising that MSA algorithm would have worse 
prediction accuracies than CFA and CMSA because CB systems traditionally has 
worse performances than collaborative algorithms. In fact, the use of CB filtering is 
interesting to overcome some problems of CF systems, such as the cold-start one. In 
order to investigate this behavior another experiment would be necessary, but this is 
not our purpose in this work. 
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5 Final Comments and Conclusions 

In the present work we describe a new IF method that uses the fundamental idea of 
Information Filtering Based on Modal Symbolic Objects approach [5] to build rich 
user profiles. We propose a suitable similarity function which are able to compare two 
MS user profiles. Throughout the proposed function we are able to select best 
neighbors of some active user in order to perform the kNN-CF algorithm. 

The proposed method was evaluated in same conditions with other three 
information filtering algorithms. In order to perform this evaluation, it was prepared a 
experimental environment based on MovieLens and IMDB databases. The 
experimental environment was defined taking into account real scenarios of 
information systems, such as the lack of information about user in the beginning of 
system usage. An appropriate metric and methodology was used to measure the 
prediction accuracy of systems in Find Good Items task [7]. 

We show our new method improves the quality of recommendation lists when 
there is too little information about the user. If we suppose the user provide 
evaluations for 5 items in the first contact, which is very acceptable in real life, the 
system are able to supply good recommendation lists which can motivate the user to 
come back more times to the system rising fidelity. 

Acknowledgments. The authors would like to thank CNPq (Brazilian Agency) for its 
financial support. 
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Abstract. Constructive Induction methods aim to solve the problem of 
learning hard concepts despite complex interaction in data. We propose 
a new Constructive Induction method based on Genetic Algorithms with 
a non-algebraic representation of features. The advantage of our method 
to some other similar methods is that it constructs and evaluates a com- 
bination of features. Evaluating constructed features together, instead of 
considering them one by one, is essential when number of interacting at- 
tributes is high and there are more than one interaction in concept. Our 
experiments show the effectiveness of this method to learn such concepts. 



1 Introduction 

Learning algorithms based on similarity assume that cases belonging to the same 
class are located close to each other in the instance space defined by original 
attributes. So, these methods attain high accuracy on domains where the data 
representation is good enough to maintain the closeness of instances of the same 
class, such as those provided in Irvine databases [1]. But for hard concepts with 
complex interaction, each class is scattered through the space due to low-level 
representation [2,3]. Interaction means the relation between one attribute and 
the target concept depends on another attribute. When the dependency is not 
constant for all values of the other attribute, the interaction is complex [4]. 
Figure 1 shows an example of interaction and complex interaction between two 
attributes in instance space. The complex interaction has been seen in real-world 
domains such as protein secondary structure [5]. 

Constructive induction (Cl) methods have been introduced to ease the at- 
tribute interaction problem. Their goal is to automatically transform the original 
representation space of hard concepts into a new one where the regularity is more 
apparent [6,7]. This goal is achieved by constructing new features from the given 
attribute set to abstract the interaction among several attributes into a new one. 

Most Cl methods apply a greedy search to find new features. But because 
of the attributes’ interaction, the search space for constructing new features has 
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Fig. 1. Interaction and complex interaction in instance space 



more variation. Recent works on problems with interaction [8, 3] show that a 
global search strategy such as Genetic Algorithms (GA) [9] is more likely to be 
successful in searching through the intractable and complicated search space [10]. 

Another limitation of many Cl methods is the language they apply for repre- 
senting constructed features. These methods use algebraic form of representing 
features. By algebraic form we mean features are shown by means of some alge- 
braic operators such as arithmetic or Boolean operators. By contrast to this kind 
of representation, we have non-algebraic form of representation, which means no 
operator is used for representing the feature. For example, for a Boolean attribute 
set {Xi,X 2 } an algebraic feature like (Xi AX 2 ) V (Xi A X 2 ) can be represented by 
a non-algebraic feature such as (0110), where the j th element in (0110) represents 
the outcome of the function for j th combination of attributes X\ and X 2 acording 
to the truth table. This form of representation has been used in [11] and [12] 
for learning. We showed in [8] that a complex algebraic expression is required 
to capture and encapsulate the interaction into a feature, while non-algebraic 
representation reduces the difficulty of constructing complex features. For this 
reason, and in spite of their use of GA search strategy, some Cl methods such 
as GCI [13], GPCI [14], Gabret [15] and the hybrid method of Rittlroff et al [16] 
fail when a high-order complex interaction exists among attributes. 

There are very few methods that use non-algebraic form of representation; 
among them are MRP [17] and DCI [18]. These Cl methods produced high 
accuracy on complex concepts with high interaction. However they have limita- 
tions and deficiencies. MRP represents features by sets of tuples obtained from 
training data using relational projection, and applies a greedy search in order to 
construct features. Each constructed feature is used as a condition to split data 
in two parts and then for each part, new features are generated. Therefore, only 
one feature is constructed and evaluated at a time. DCI applies GA to reduce 
the problem of local optima. However, it still evaluates features one at a time 
and is incapable of constructing more than one feature. Therefore, both methods 
construct and evaluate new features individually. 

When the number of interacting attributes is high, the feature that encap- 
sulates the interaction is complex and difficult to be constructed. A Cl method 
must break down the complex interaction into several smaller features. Such 
set of features works as a theory of intermediate concepts that bridge the gap 
from the input data representation to the hard target concept. In that case 
each feature that partially shows the interaction, by itself, does not give enough 
information about concept and may be considered as an irrelevant feature. In 
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order to see the goodness of new features, the Cl method should evaluate the 
combination of features together as related parts of the theory. For this reason 
MRP, DCI, and other similar methods fail when number of interacting attributes 
grows and the interaction is complex (see Sect. 3). 

In this paper, we will introduce MFE/GA, a Multi-feature Extraction method 
based on GA and non-algebraic form of representation, for constructing a set 
of new features to highlight the interaction that exists among attributes. Our 
experiments show the advantage of our method over other Cl methods when 
number of interacting attributes grows, and the interaction is complex. 



2 GA Design 

MFE/GA uses GA to generate and combine features defined over different sub- 
sets of original attributes. The current version of the method only works with 
nominal attributes. Therefore, any continuous attribute should be converted to 
nominal attributes before running MFE/GA. The GA receives training data rep- 
resented by original attributes set and finds subsets of interacting attributes and 
features representing the interaction. When the GA finishes, the new features 
are added to the attributes set and the new representation of data is given to a 
standard learner for learning. This section explains the system design. 

2.1 Individuals Representation 

For constructing new features we need to perform two tasks: finding the subset 
of interacting attributes and generating a function defined over each subset. Our 
individuals are sets of subsets of primitive attributes such as Ind = (Si, S 2 , • • • , Sk) 
where Si c S, Si ^ 0, and S is the set of original attributes. 

Subsets in individuals are represented by bit-strings of length n, where n is 
the number of original attributes; each bit showing the presence or absence of 
the attribute in the subset. Therefore each individual is a bit-string of length 
k.n ( k > 0) such as Ind = (bi,...,b n : b'i,...,b' n : 6/ : ...). 

Since each individual has different number of subsets, the length of individ- 
uals is variable. To avoid unnecessary growth of individuals we have limited the 
number of subsets in individuals so that k < 5. 

Each subset in individual is associated with a function that is extracted from 
data. Thus each individual is actually representing a set of functions such as 
{Fi,F 2 , . . . ,F k }, where F, is a function defined over Si. It is important to note 
that during mutation and crossover if a subset is changed, the associated function 
is also changed since a new F[ is extracted for the new subset S( in the offspring. 

The function F, created for any given subset Si = {AM, . . . , Xi m } in an in- 
dividual uses a non-algebraic form of representation (see Sect. 1). As explained 
in [8], this form of representation reduces the difficulty of constructing complex 
features. Each F, is defined by assigning Boolean class labeles to all the tuples 
in the Cartesian product Xn x ... x Xim- The class assigned to each tuple t de- 
pends on the training samples that match the tuple, that is, the training samples 
whose values for attributes in Si are equal to the corresponding values in tuple t. 
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— Unseen area 
(Case 1) 

Negative area 
(Case 2) 

Fig. 2. Space of samples defined by attributes in subset S', 

More precisely, the class assigned depends on the class labels of all those training 
samples matching the tuple, as discussed next case by case: 

Case 1. If there are no training samples matching t, a class label is assigned 
to Fi(t) stochastically, according the class distribution in the training data. 
Case 2. If all training samples matching t belong to the same class, this is 
the class assigned to F,(t). 

Case 3. If there is a mixture of classes in the samples matching f, the class 
assigned to F,(() depends on the numbers of tuples labelled by Case 2 as 
positive and negative, p 2 and n .2 respectively. In particular, if p 2 = n 2 = 0, 
the class is assigned stochastically (as in Case 1); if p 2 > ri 2 , the negative 
class is assigned; and otherwise, the positive class is assigned. 

This procedure for extracting the definition of F t from data partitions the 
subspace defined by Si into four areas, as illustrated in Fig. 2. Each F, identifies 
similar patterns (Case 2) of interaction among Si and compresses them into the 
negative or postive area. The unseen area (Case 1) is covered by stochastically 
predicting the most frequent class. Note this covering of the unseen area means 
generalization , and thus may involve prediction errors. 

As we will see next, the GA’s fitness evaluation is applied to individuals 
composed of several Fi, each defined over a subset Si. Thus, each individual 
encapsulates several interactions into features, and this allows the GA to simul- 
taneously construct and evaluate features, which turns out to be essential when 
several higlr-orcler interactions exist in data. 

2.2 Fitness Function 

After feature extraction, for each Ind = (Si,..., Sk), data are projected onto the 
set of new features {Fi, . . . ,F k } and the goodness of the individual is evaluated 
by the following formula: 

Fitness(Ind) = - ^1, ~ *+l g + h + ^ + D + £l|! (1 ) 

where r + is set of positive tuples and is set of negative tuples obtained 
by projecting data into {Fi, . . . , Fk}, r is the total number of tuples in training 
data, the single bars \z\ denote the number of attributes (or tuples) in subset (or 
relation) z, and the double bars |p| denotes the number of examples in training 
data that match with the tuples in relation p. The objective of GA is to minimize 
the value of Fitness(Ind). The first term in this formula estimates how good is the 
set of newfeatures for classifying data. It is divided by k + 1 to favor individuals 
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with larger number of subsets. The aim is to prefer several simple features to 
few complex features. The complexity of features is evaluated in the last term 
by measuring the fraction of attributes participating in constructing features. 

To reduce overfitting, we use 90% of training data for generating functions 
and all training data for fitness evaluation. Aside from that, the empirical eval- 
uation of the system will be based on unseen data (see Sect. 3). 

2.3 GA Operators 

The genetic operators have the role of converging GA to optimal solution. We use 
mutation and crossover operators. Our objective is to generate different subsets 
of attributes with their associated functions and combine them to eventually find 
the group of functions defined over subsets of interacting attributes. To achieve 
this objective, we apply operators in two levels: attributes level to generate dif- 
ferent subsets and features; and, subsets level to make different combination of 
subsets and features. Therefore we have two types of mutation and two types of 
crossover, illustrated in Fig. 3 where colons are used to seperate subsets of at- 
tributes, bars mark the crossover points, and underlined are genes in the parents 
that will be substituted by the operator. 

Mutation Type- 1— Mutation in Attributes Level: This operator is per- 
formed by considering the individual as a bit-string of size k.n where k is the 
number of subsets and n is the number of original attributes. The traditional 
mutation is applied over the bit-string to flip bits of the string. By flipping a bit, 
we eliminate/add an attribute from/to any subset in any given individual. 

Mutation Type-2— Mutation in Subsets Level: For this operator an indi- 
vidual is considered as a sequence of subsets and mutation is performed over any 
subset as whole, by replacing the subset with a new generated subset. Therefore, 
after this operation, some subsets are eliminated from the given individual and 
new subsets are added to produce a new combination of subsets. 

Crossover Type- 1— Crossover in Attributes Level: This operator applies 
classical two-point crossover considering the individual as a bit-string. Its aim is 
to generate new subsets by recombining segments of subsets from the parents. 
The two crossing points in the first parent are selected randomly. On the sec- 
ond parent, the crossing points are selected randomly, subject to the restriction 
that they must have the same distance from the subsets boundary in bit-string 



Mutation in Attributes Level 
Parentl = (10010010:01010100:00010111) 

Childl = (11010011:01110100:00000111) 
Crossover in Attributes Level 
Parent3 = (10011 0010:010 110100:00010111) 
Parent4 = (00101 1011:10010001:111 101100) 

Child3 = (10011011:10010001:11110100:00010111) 
Child4 = (00100010:01001100) 



Mutation in Subsets Level 

Parent2 = (Si , S2 , S3 , S 4 , S 5 ) 

Child2 = (Si,S',S 3 ,S 4 ,S 5 ) 

Crossover in Subsets Level 

Parent5 — (Sn,Si2) Maskl= (10) 

Parent6 — (S21 , S22 , S23 , S24, S25) Mask2= (01101) 

Child5 = (Sn, S22, S23) S25) 

Child6 = (S12.S21.S24) 



Fig. 3. GA operators 
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representation as they had in the first parent [19]. Depending on where the cross- 
ing points are situated this operator may generate new subsets from subsets of 
parents and/or recombine subsets. This operator may change the length of the 
individual but we impose the limitation of k < 5; otherwise, new crossing points 
are selected until the produced offspring have k < 5 subsets. 

Crossover Type-2— Crossover in Subsets Level: The aim of this operator is 
to generate different combinations of subsets by exchanging subsets of parents. 
It considers individuals as sequence of subsets and performs uniform crossover. 
Two crossover masks are generated randomly to define the cutting points. This 
operator may change the length of the individuals and therefore has the restric- 
tion of k < 5 same as above. Crossover Type-2 only recombines subsets and does 
not generate any new subset. 

3 Empirical Analyses 

We analyzed MFE/GA by empirically comparing it with other similar methods. 
For implementing GA we used PGAPack Library [20] with default parameters 
except those indicated in Table 1. The type of mutation and crossover operators 
(see Sect. 2.3) is specified by flipping a coin when individuals are selected for 
reproduction. We used 90% of training data for constructing functions and all 
training data for evaluating the constructed feature using ( 1 ). 

To compare MFE/GA with other Cl methods, we performed experiments 
over synthetic problems. Since our objective is to learn concepts with more than 
one interaction, artificial problems of this kind were selected for experiments. 
These problems were used as prototypes to exemplify complex interactions in 
real-world hard problems. Therefore, similar results are expected for real-world 
problems, where the main difficulty is complex interaction. 

These concepts are defined over 12 Boolean attributes a\, . . . , a 12 as follows: 

- cp(i,j) = Parity(ai , . . . , ae) A Parity(a 7 , . . . , a,j ) 

- cdp(i,j) = Parity (ai , . . . , a 4 ) A ( Parity(a( i+j )/ 2 , . . . ,a 8 ) V Parity(aj,. . . , 012 )) 

- P( 3, 6 ) A (l) = Parity(a 3 , . . . , ae) A (exactly l attributes in {aj , . . . , 012 } are true) 

- P( 3, 6 ) V ( l ) = Parity(a 3 , . . . , ae) V (exactly l attributes in {a-r , . . . , 012 } are true) 

All above concepts have complex interaction among attributes that can be 
represented by several smaller interactions. 

For each concept MFE/GA was run 20 times independently using 5% of 
shuffled data for training and the rest for final evaluation. When MFE/GA is 
finished, its performance was evaluated by the accuracy of C4.5 [21] on modified 
data after adding constructed features, using 95% unseen data as test data. 

Table 1 . GA’s modified parameters 



GA Parameter 


New Value 


GA Parameter 


New Value 


Population Size 


100 


Mutation Probability 


0.01 


Max Iteration 

Max No Change Iteration 


350 

100 


Num. of Strings to be Replaced 


90 
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Table 2. Summary of Cl methods 



Cl 

Method 


Algebraic Rep- 
resentation 


Feature 

Evaluation 


Genetic 

Search 


Cl 

Method 


Algebraic Rep- 
resentation 


Feature 

Evaluation 


Genetic 

Search 


Fringe 


Yes 


No Individually 


No 


MRP 


No 


Individually 


No 


Grove 


Yes 


Individually 


No 


DCI 


No 


Individually 


Yes 


Greedy3 


Yes 


Individually 


No 


MFE 


No 


No Individually 


Yes 


LFC 


Yes 


Individually 


No 


GA 









Table 3. System comparison by average accuracy 



Concept 




Num. of Rele- 
vant Attributes 


Prior Best Result 


MRP 


DCI 
+ C4.5 


MFE/GA 
+ C4.5 


cp(4,9) 




6 


86.1 (Fringe) 


99.0 (1.6) 


97.2 (1.8) 


98.9(4.0) 


cp(3,10) 




8 


73.4 (C4. 5-rules) 


89.9 (1.2) 


81.6 (1.6) 


95.7 ( 8 . 9 ) 


cp(2,ll) 




10 


73.9 (C4.5) 


91.7 (5.7) 


68.0 (1.2) 


97.1 ( 4 . 8 ) 


cdp(3,ll) 




6 


98.4 (Fringe) 


97.3 (3.9) 


97.6 (1.6) 


99.2 (3.1) 


cdp(2,10) 




9 


78.1 (Fringe) 


92.0 (6.5) 


67.7 (2.0) 


86.6 (10.3) 


cdp(l,9) 




12 


62.5 (C4. 5-rules) 


81.3 ( 3 . 0 ) 


56.4 (2.5) 


71.1 (7.4) 


P(3,6)A(2) 




10 


88.1 (C4.5) 


90.4 (4.5) 


81.0 (1.5) 


95.9 ( 2 . 9 ) 


P(3,6)A(3) 




10 


83.4 (C4.5) 


87.6 (5.4) 


75.9 (2.4) 


94.1 ( 5 . 4 ) 


P(3,6)A(3 or 


2) 


10 


70.4 (Fringe) 


79.0 (2.3) 


65.4 (2.6) 


90.8 ( 5 . 6 ) 


P(3,6)V(2) 




10 


70.5 (Fringe) 


97.1 ( 2 . 9 ) 


57.9 (2.2) 


90.3 (6.6) 


P(3,6)V(3) 




10 


68.0 (Fringe) 


95.9 (4.4) 


59.3 (1.7) 


92.1 (7.1) 


P(3,6)V(3 or 


2) 


10 


75.1 (Fringe) 


80.4 (6.0) 


69.5 (2.2) 


92.5 ( 7 . 5 ) 



We compared MFE/GA with C4.5 and C4. 5-Rules [21], which are similarity- 
based learners, Fringe, Grove and Greedy3 [22], and LFC [23], which are greedy- 
based Cl methods that use algebraic representation of features and MRP [17] 
which is a greedy Cl method with a non-algebraic form of representation (see 
Sect. 1). Among Cl methods only Fringe constructs several features at once. Other 
methods consider features one at a time. We also compared MFE/GA with DCI 
[18], which applies GA to reduce the problem of local minima (see Sect. 1). This 
method uses a feature representation similar to MFE/GA. However, it constructs 
and evaluates one feature during each generation. Therefore, when an interaction 
among a large set of attributes exists in the concept, this method fails to learn the 
concept. Table 2 summarizes the Cl methods that are used in our experiments. 

Table 3 gives a summary of MFE/GA’s average accuracy over 20 runs and 
its comparison with other systems’ average accuracy. In the third column of the 
table, we show the best results among C4.5, C4. 5-Rules, Fringe, Grove, Greedy3 
and LFC, as reported in [17]. Numbers between parentheses indicates standard 
deviation. Bolds means with a significant level of 0.05, this accuracy is the best 
between MRP and MFE/GA. 

In these experiments, LFC, Greedy3 and Grove never appear as the best 
competitor among Cl methods with algebraic representation. This may be due 
to the fact that they evaluate each proposed feature individually. When more 
than one interaction exist among attributes, a feature that encapsulates a single 
interaction may not provide, by itself, enough information about the final target 
concept, and therefore, is evaluated as an irrelevant feature. 

MRP gives better result than other greedy methods because of its non- 
algebraic form of representing features. When high interaction exists among 
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attributes, a more complex algebraic feature is needed to abstract the interac- 
tion. However, a non-algebraic representation may capture the structure of the 
interaction and abstract it more easily. 

Experiments in [18] show that the genetic-based method, DCI, outperforms 
MRP when the concept consists of only one complex interaction over large num- 
ber of attributes, e.g. Parity (ai, ... , as). But for concepts with more than one 
interaction, like those in Table 3, DCI cannot achieve good accuracy because it is 
incapable of constructing more than one feature. It finds a subset of interacting 
attributes and constructs one feature to encapsulate the interaction. But as the 
interaction is complex, the constructed feature is not good enough to outline all 
the interaction and, hence, MRP’s greediness outperforms this method. 

However, while the number of interacting attributes grows, the concept be- 
comes more complex to be learned. Since MRP constructs and evaluates features 
separately with a greedy search, its performance decays. MFE/GA successfully 
breaks down the interaction over relevant attributes into two or more interac- 
tions over smaller subsets of attributes using a global search; and therefore, it 
gives better accuracy than other methods in most concepts of Table 3. 

The synthetic concepts cdp(i,j ) illustrates well the different behaviors and 
advantages of MRP and MFE/GA. Each of the concepts cdp{ 1,9), cdp( 2, 10) and 
cdp( 3, 11) involves three parity relations combined by simple interactions (Con- 
junction and Disjunction of Parity). The three concepts differ in the degree of 
parity involved (4, 3, and 2, respectively) but perhaps more importantly, they 
also differ in the ratio of relevant attributes (12/12, 9/12, and 6/12, respec- 
tively). This is the reason why, obviously, all results in Table 3 indicate that 
cdp( 1, 9) is the most difficult concept to learn: it has no irrelevant attributes that 
when projected away allow the complex substructures of the concept to became 
apparent, when learning from only 5% of data. 

This affects MFE/GA in a higher degree than it affects MRP, probably due 
to differences between both systems’ biases. In particular, considering cdp( 1,9), 
MRP’s focus on learning one single best relation probably guides learning toward 
Parity (ai , . . . , < 24 ). As stated above, the interactions that combine parity features 
in this concept are simple (conjunction and disjunction). So MRP easily finds its 
way toward that parity feature. Had it used only that single feature to classify 
unseen data, it would have obtained even higher accuracy than it does (up to 
87%). Perhaps, due to overfitting, MRP’s heuristic function does not allow the 
system to reach such theoretically best possible performance on this concept. 
However, MRP’s bias gets it closer to the goal than MFE/GA. 

MFE/GA’s bias is, in some sense, opposite to MRP’s. It focuses on learning 
multiple features at once to evaluate them in combination. This higher flexibility 
in searching a large and complex feature space makes the system more dependent 
on data quality (since features are extracted from training data) . Thus, MFE/GA 
exploits better than MRP the redundancy in data for cdp( 3,11). Since concepts 
cdp( 2,10) and cdp( 3,11) are defined, respectively, over 9 and 6 attributes of a 
total of 12 attributes, the 5% training data is likely to contain repeated parts 
of the concept structure, that become more apparent when projecting away the 
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irrelevant attributes. However, this does not make these concepts easy to learn 
by non-CI methods, due to the complexity of the interactions involved. MRP 
tries to learn these two concepts as a single relation each, whereas MFE/GA 
beats it by seeing the multiple features involved at once. On the other hand, for 
cdp( 1,9), all 12 attributes are relevant, and so there is little or no redundancy in 
the 5% training data. Therefore, MFE/GA’s more flexible search gets trapped 
in a local minima, overfitting data, whereas MRP is favored by its strong bias 
for one best relation, which in this case does indeed exists and it is easy to find. 

It is important to note that in all experiments MFE/GA generates close 
approximations to the sub- interactions; excluding experiments over cdp( 1,9), in 
more than 86% of experiments our method successfully finds the exact subsets 
of interacting attributes. 

Our use of synthetic concepts allows us to analyze system behavior deeply 
before moving on to try to solve real-world problems with difficulties similar to 
those exemplified by these synthetic concepts. 



4 Conclusion 

This paper has presented MFE/GA, a new Cl method for constructing a combi- 
nation of features to highlight the relation among attributes when high complex 
interaction exists in the target concept. The method applies a non-algebraic form 
of representing features, which is more adequate for constructing features when 
interaction is complex. 

The genetic approach provides the ability of constructing and evaluating sev- 
eral features at once. Most Cl methods consider features individually. When the 
complex interaction is of higher order and need to be represented by more than 
one feature, each feature by itself may not give enough information about the 
interaction and therefore is evaluated as irrelevant. The new features should be 
considered in combination when evaluated, as if they build up a theory com- 
posed of intermediate concepts that bridge the gap between a primitive low level 
representation and a high level complex concept. 

Our experiment shows the advantage of non-algebraic representation of fea- 
tures over algebraic representation. Also it shows that Cl methods that consider 
features individually fail to learn concepts when number of interacting attributes 
grows. Our method with GA-based global search and non-algebraic representa- 
tion successfully finds the combination of features that represent interaction and 
outperforms other Cl methods when the interaction becomes more complex. 
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Abstract. In order to achieve high precision Question Answering Sys- 
tems or Information Retrieval Systems, the incorporation of Natural Lan- 
guage Processing techniques are needed. For this reason, in this paper 
a method to determine the semantic role for a constituent is presented. 
The goal of this is to integrate the method in a Question Answering 
System and in an Information Retrieval System. So, several experiments 
about the Semantic Role method, named SemRol, are shown. 



1 Introduction 

One of the challenges of applications such as Information Retrieval (IR) or Ques- 
tion Answering (QA), is to develop high quality systems (high precision IR/QA). 
In order to do this, it is necessary to involve Natural Language Processing (NLP) 
techniques in this kind of systems. Among the different NLP techniques which 
would improve Information Retrieval or Question Answering systems it is found 
Word Sense Disambiguation (WSD) and Semantic Role Labeling (SRL). In this 
paper a method of Semantic Role Labeling using Word Sense Disambiguation is 
presented. This research is integrated in the project R2D2 1 . 

A semantic role is the relationship between a syntactic constituent and a 
predicate. For instance, in the next sentence 

(EO) The executives gave the chefs a standing ovation 

The executives has the Agent role, the chefs the Recipient role and a standing 
ovation the Theme role. 

The problem of the Semantic Role Labeling is not trivial. In order to identify 
the semantic role of the arguments of a verb, two phases have to be solved, pre- 
viously. Firstly, the sense of the verb is disambiguated. Secondly, the argument 
boundaries of the disambiguated verb are identified. 



1 This paper lias been supported by the Spanish Government under project 
”R2D2: Recuperation de Respuestas de Documentos Digitalizados” (TIC2003-07158- 
C04-01). 

C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 256—265, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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First of all, the sense of the verb has to be obtained. Why is necessary to 
disambiguate the verb? Following, an example shows the reason for doing so. 

(El) John gives out lots of candy on Halloween to the kids on his block 

(E2) The radiator gives off a lot of heat 

Depending on the sense of the verb a different set of roles must be consid- 
ered. For instance, Figure 1 shows three senses of verb give (give. 01, give. 04, 
and give. 06)) and the set of roles of each sense. So, sentence (E0) matches with 
sense give. 01. Therefore, roles giver , thing given and entity given to are con- 
sidered. Nevertheless, sentence (El) matches with sense give. 06 and sentence 
(E2) matches with sense give. 04. Then, the sets of roles are ( distributor , thing 
distributed , distributed) and ( emitter , thing emitted), respectively. In sentence 
(El), John has the distributor role, lots of candy the thing distributed role, the 
kids on his block the distributed role and on Halloween the temporal role. In sen- 
tence (E2), the radiator has the emitter role and a lot of heat the thing emitted 
role. These examples show the relevance of WSD in the process of assignment 
of semantic roles. 



<roleset id- 'give.0 1 " name="transfer"> <roles> 

<role n="0" descr="giver" vntheta="Agent"/> 

<role n=" 1 " descr="thing given" vntheta="Theme"/> 

<role n="2" descr="entity given vntheta="Recipient""/> 

</roles> 

<roleset id="give.04" name="emit"> <roles> 

<role n="0" descr="emitter"/> 

<role n=" 1 " descr="thing emitted"/> 

</roles> 

<roleset id="give.06" name="transfer"> <roles> 

<role n="0" descr="distributor"/> 

<role n=" 1 " descr="thing distributed"/> 

<role n="2" descr="distributed"/> 

</roles> 

Fig. 1. Some senses and roles of the frame give in PropBank [13] 



In the second phase, the argument boundaries are determined. For instance, 
in the sentence (E0), the argument boundaries recognized are 

[The executives] gave [the chefs] [a standing ovation] 

Once these two phases are applied, the assignment of semantic roles can be 
carried out. 

To achieve high precision IR/QA systems, recognizing and labeling seman- 
tic arguments is a key task for answering “Who” , “When” , “What” , “Where” , 
“Why”, etc. For instance, the following questions could be answered with the 
sentence (E0). The Agent role answers the question (E3) and the Theme role 
answers the question (E4). 
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(E3) Who gave the chefs a standing ovation? 

(E4) What did the executives give the chefs? 

These examples show the importance of semantic roles in applications such 
as Information Retrieval or Question Answering. 

Currently, several works have tried using WSD or Semantic Role Labeling in 
IR or QA systems, unsuccessfully. Mainly, it is due to two reasons: 

1. The lower precision achieved in these tasks. 

2. The lower portability of these methods. 

It is easy to find methods of WSD and Semantic Role Labeling that work with 
high precision for a specific task or specific domain. Nevertheless, this precision 
drops when the domain or the task are changed. For these reasons, this paper is 
about the problem of a Semantic Role Labeling integrated with WSD system. A 
method based on a corpus approach is presented and several experiments about 
both, WSD and Semantic Role Labeling modules, are shown. Shortly, a QA 
system with this Semantic Role Labeling module using WSD will be developed 
in the R2D2 framework. 

The remaining paper is organized as follows: section 2 gives an idea about 
the state-of-art in automatic Semantic Role Labeling systems in the subsection 

2.1. Afterwards, the maximum entropy-based method is presented in subsection 

2.2. Then, some comments about experimental data, and an evaluation of our 
results using the method, are presented in sections 3 and 4, respectively. Finally, 
section 5 concludes. 

2 The SemRol Method 

The method, named SemRol, presented in this section consists of three phases: 

1. Verb Sense Disambiguation phase (VSD) 

2. Argument Boundaries Disambiguation phase (ABD) 

3. Semantic Role Disambiguation phase (SRD) 

These phases are related since the output of VSD phase is the input of ABD 
phase, and the output of ABD phase is the input of SRD phase. So, the success 
of the method depends on the success of the three phases. 

Both, Verb Sense Disambiguation phase and Semantic Role Disambiguation 
phase are based on Maximum Entropy (ME) Models. Argument Boundaries Dis- 
ambiguation and Semantic Role Disambiguation phases take care of recognition 
and labeling of arguments, respectively. VSD module means a new phase in the 
task. It disambiguates the sense of the target verbs. So, the task turns more 
straightforward because semantic roles are assigned to sense level. 

In order to build this three-phase learning system, training and development 
data set are used. It is used PropBank corpus [13], which is the Penn Treebank 
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corpus [11] enriched with predicate-argument structures. It addresses predicates 
expressed by verbs and labels core arguments with consecutive numbers (AO to 
A5), trying to maintain coherence along different predicates. A number of ad- 
juncts, derived from the Treebank functional tags, are also included in PropBank 
annotations. 

A previous approximation of this method is presented in [12]. In this paper, 
the main features of VSD and SRD phases and the experiments that prove the 
results are shown. 

2.1 Background 

Several approaches [3] have been proposed to identify semantic roles or to build 
semantic classifier. The task has been usually approached as a two phase proce- 
dure consisting of recognition and labeling arguments. 

Regarding the learning component of the systems, several systems can be 
found in CoNLL 2004 shared task [1], For instance, Maximum Entropy mod- 
els ([2]; [10]), Brill’s Transformation-based Error-driven Learning ([8]; [18]), 
Memory-based Learning ([17]; [9]), vector-based linear classifiers ([7]; [14]), Voted 
Perceptrons [4] or, SNoW, a Winnow-based network of linear separators [15]. 

In these systems only partial syntactic information, i.e., words, part-of-speech 
(PoS) tags, base chunks, clauses and named entities, is used. 

2.2 The Core of SemRol Method 

The method consists of three main modules: i) Verb Sense Disambiguation (VSD) 
Module, ii) Argument Boundaries Disambiguation (ABD) Module, and iii) Se- 
mantic Role Disambiguation (SRD) Module. 

First of all, the process to obtain the semantic role needs the sense of the target 
verb. After that, several heuristics are applied in order to obtain the arguments of 
the sentence. And finally, the semantic roles that fill these arguments are obtained. 

Verb Sense Disambiguation Module. This module is based on the WSD 
system developed by [16]. It is based on conditional ME probability models. 

The learning module produces classifiers for each target verb. This module 
has two subsystems. The first subsystem processes the learning corpus in order 
to define the functions that will apprise the linguistic features of each context. 
The second subsystem of the learning module performs the estimation of the 
coefficients and stores the classification functions. 

The classification module carries out the disambiguation of new contexts 
using the previously stored classification functions. When ME does not have 
enough information about a specific context, several senses may achieve the 
same maximum probability and thus the classification cannot be done properly. 
In these cases, the most frequent sense in the corpus is assigned. However, this 
heuristic is only necessary for a minimum number of contexts or when the set of 
linguistic attributes processed is very small. 

The set of features defined for the training of the system is based on words, 
PoS tags, chunks and clauses in the local context. 
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Argument Boundaries Disambiguation Module. After determining the 
sense for every target verb of the corpus, it is necessary to determine the argu- 
ment boundaries of those verbs. In order to do so, left and right arguments have 
to be considered. Left argument is the noun phrase at the left of the target verb, 
and right argument is the noun phrase at the right of the target verb. Besides, 
if exists a prepositional phrase close together the right noun phrase, it will be 
considered a second right argument. In any case, the phrases must belong to the 
same clause as the target verb. So, in the sentence the target verb is narrow , 
the left argument is the current account deficit and right arguments are only 1.8 
billion and in September. 

(E5) The current account deficit will narrow to only 1.8 billion in September 

Finally, the verbal phrase of the target verb is considered as the verb argu- 
ment, and modal verbs and particles not and nt in the verbal phrase of the target 
verb, as arguments. For instance, in the previous sentence, will is considered an 
argument. 

It is expected that the number of successes in left arguments, modal argu- 
ments and negative arguments, will be high and it will not account for much 
error. However, the results in right arguments will be probably lower. In future 
works we will take interest in determining the arguments of the verbs using a 
machine learning strategy, such as a maximum entropy conditional probability 
method, or a support vector machines method [6]. This strategy will allow us to 
determine the argument boundaries more accurately. 

Semantic Role Disambiguation Module. Finally, the role for each target 
verb depending on sense will be determined. This task uses a conditional ME 
probability model. This one is like the method used in WSD task. In this case, 
features are extracted for each argument for every target verb. These features 
are used to classify those arguments. Instead of working with all roles [5], in this 
classification, the classes considered will be the roles of each sense of each verb. It 
increases the total number of the classes for the full task on SRD, but it reduces 
the partial number of classes that are taken into account in each argument, 
considerably. In the sentence (E6), the sense of fail is 01, so, the classes of the 
roles 0,1, 2, 3, of fail. 01 have just been considered, however the roles 0,1 of fail. 02 
have not been considered. It is possible to do this because the sense of every 
target verb was determined in the VSD module. Figure 2 shows the roles of fail 
verb. 

(E6) Confidence in the pound is widely expected to take another sharp dive 
if trade figures for September, due for release tomorrow, fail to show a 
substantial improvement from July and August’s near-record deficits 

The set of features defined for the training of the system is based on words, 
PoS tags, chunks and clauses in the local context. 
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<roleset id="fail.01" name="not succeed"> <roles> 

<role n="0" descr="assessor of not failing (professor)7> 
<role n=" 1 " descr=''thing failing"/> 

<role n="2" descr="task"/> 

<role n="3" descr=''benefactive"/> 

</roles> 

<roleset id="fail.02" name="give failing grade"> <roles> 

<role n="0" descr="teaeher"/> 

<rolen="l" descr=''student"/> 

</roles> 

Fig. 2. Senses and roles of the frame fail in PropBank 



3 Experimental Data 

Our method has been trained and evaluated using the PropBank corpus [13], 
which is the Penn Treebank [11] corpus enriched with predicate-arguments struc- 
tures. To be precise, the data consists of sections of the Wall Street Journal. 
Training set matches with sections 15-18 and development set matches with 
section 20. 

PropBank annotates the Penn Treebank with arguments structures related 
to verbs. The semantic roles considered in PropBank are the following [3]: 

— Numbered arguments (A0-A5, AA): Arguments defining verb-specific roles. 
Their semantic depends on the verb and the verb usage in a sentence, or 
verb sense. In general, AO stands for the agent and A1 corresponds to the 
patient or theme of the proposition, and these two are the most frequent 
roles. However, no consistent generalization can be made across different 
verbs or different senses of the same verb. PropBank takes the definition of 
verb senses from VerbNet, and for each verb and each sense defines the set 
of possible roles for that verb usage, called roleset. 

— Adjuncts (AM-): General arguments that any verb may take optionally. 
There are 13 types of adjuncts: 

• AM-LOC: location 

• AM-EXT: extent 

• AM-DIS: discourse marker 

• AM-ADV: general-porpouse 

• AM-NEC: negation marker 

• AM-MOD: modal verb 

• AM-CAU: cause 

• AM- TEMP: temporal 

• AM-PRP: purpose 

• AM-MNR: manner 

• AM-DIR: direction 

— References (R-): Arguments representing arguments realized in other parts 
of the sentence. The role of a reference is the same than the role of the 
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Table 1. Results on the development set 



VSD 


Successes 


3650 


Precision 


0.88 




Fails 


486 


Recall 


0.85 




No disambiguated 


169 


FI 


0.86 


ABD 


Successes 


4426 


Precision 


0.51 




Fails 


4201 


Recall 


0.40 




No disambiguated 


2494 


FI 


0.45 


SRD 


Successes 


5085 


Precision 


0.48 




Fails 


5464 


Recall 


0.46 




No disambiguated 


572 


FI 


0.47 



referenced argument. The label is an R-tag preceded to the label of the 
referent, e.g. R.-Al. 

— Verbs (V): Participant realizing the verb of the proposition. 

Training data consists of 8936 sentences, with 50182 arguments and 1838 
distinct target verbs. Development data consists of 2012 sentences, with 11121 
arguments and 978 distinct target verbs. 

Apart from the correct output, both datasets contain the input part of the 
data: PoS tags, chunks and clauses. Besides, the sense of verb is available if the 
word is a target verb. 



4 Results and Discussion 

Following, the results of the three modules are shown in Table 1. These results 
have been obtained on the development set. Modules have been evaluated based on 
precision, recall and FI measure. In each case, precision is the proportion of senses, 
arguments or roles predicted by the system which are correct; and recall is the 
proportion of correct senses, correct arguments or correct roles which are predicted 
by each module. That is, the senses, arguments and roles not desambiguated are 
not considered when computing precision but they are considered when computing 
recall. FI computes the harmonic mean of precision and recall: F j g = i=(2pr)/(p+r). 

4.1 VSD Module Results 

In this experiment one set of features have just been considered, features about 
content word in a sentence. Table 1 shows that 3650 verbs have been disam- 
biguated successfully, and 655 unsuccessfully. From these, 169 are due to no 
disambiguated verbs and 486 to mistakes in the disambiguation process. As a 
result, a precision of 88% is obtained. These results show the goodness of the 
ME module and reveal that the ME module is correctly defined. Besides, it is 
expected that the tuning with the others set of features (see section 2.2) will 
improve the results. 
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Table 2. Results on the development set. SRD module 





Precision 


Recall 


F/3=l 




Precision 


Recall 


F/3=l 


AO 


47.22% 


44.56% 


45.85 


AM-MNR 


66.29% 


17.66% 


27.90 


A1 


44.99% 


69.12% 


54.50 


AM-MOD 


82.35% 


39.59% 


53.47 


A2 


52.26% 


26.62% 


35.28 


AM-NEG 


80.00% 


64.12% 


71.19 


A3 


30.16% 


12.75% 


17.92 


AM-PNC 


39.47% 


15.00% 


21.74 


A4 


64.52% 


13.61% 


22.47 


AM-PRD 


0.00% 


0.00% 


0.00 


A5 


100.00% 


25.00% 


40.00 


AM-PRP 


0.00% 


0.00% 


0.00 


AM-ADV 


21.93% 


7.10% 


10.73 


AM-REC 


0.00% 


0.00% 


0.00 


AM-CAU 


0.00% 


0.00% 


0.00 


AM-TMP 


61.98% 


21.48% 


31.90 


AM-DIR 


55.56% 


8.33% 


14.49 


R-A0 


0.00% 


0.00% 


0.00 


AM-DIS 


86.44% 


25.00% 


38.78 


R-Al 


0.00% 


0.00% 


0.00 


AM-EXT 


70.00% 


14.29% 


23.73 


R-A2 


0.00% 


0.00% 


0.00 


AM-LOC 


44.44% 


22.61% 


29.97 


R-AM-LOC 


100.00% 


25.00% 


40.00 


R-AM-TMP 


33.33% 


33.33% 


33.33 


V 


97.44% 


97.44% 


97.44 










all 


61.92% 


59.62% 


60.75 










all— {V} 


47.42% 


44.98% 


46.17 



4.2 ABD Module Results 

In this case, a total of 4426 arguments have been detected successfully, but 6695 
have been erroneously detected or missing. Therefore, the precision of ABD 
module is 51%. 

In any case, the experiments have been done assuming correct senses for 
target verbs. By means of this, the independence of ABD module in relation to 
VSD module has been evaluated. 

These results confirm the need for determining the arguments of the verbs 
by defining new heuristics or using a machine learning strategy. 

4.3 SRD Module Results 

In order to evaluate this module, correct senses of the verbs and correct argument 
boundaries have been presumed. So, SRD module has been tested independently 
of VSD and ABD modules. 

Table 1 shows a precision of 48%. For further details, the precision for each 
kind of argument is shown in Table 2. Besides, if verb argument is considered, 
precision goes up to 62%. These results show that the ME module is correctly 
defined. However, it is need a tuning phase in order to improve them. Besides, 
a precision of 0,00% in several R- arguments shows the need for a co-reference 
resolution module. 



5 Conclusions and Working in Progress 

In this paper, a Semantic Role Labeling method using a WSD module is 
presented. It is based on maximum entropy conditional probability models. The 
method presented consists of three sub-tasks. First of all, the process of 
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obtaining the semantic role needs the sense of the target verb. After that, sev- 
eral heuristics are applied in order to obtain the arguments of the sentence. And 
finally, the semantic roles that fill these arguments are obtained. Training and 
development data are used to build this learning system. 

Results about the VSD, ABD and SRD modules have been shown. Currently, 
we are working on the definition of new features to the SRD modules. So, the 
re-definition of the heuristics is planned in order to improve the results. After 
that, we are going to work on the tuning phase in order to achieve an optimum 
identification of the semantic roles. 

On the other hand, we are working in the integration of semantic aspects 
in Question Answering or Information Retrieval Systems, in order to obtain 
High Precision QA/IR Systems. Shortly, we will show results about this 
incorporations. 
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Abstract. In order to increase the role of machines in supporting more 
capabilities as regards a spoken dialogue system, we present in this paper a new 
problem incorporating multi-session in such a system. Instead of only handling 
single dialogue, such a system can take an intermediary role to communicate 
with many users in several discontinuous sessions for reaching a compromise 
between them. We describe here a new approach for modeling the multi-session 
and then we concentrate on the multi-session management of such a system 
dedicated for a complete service having several tasks. 



1 Introduction 

The spoken dialogue system has attracted much attention as the way of 
communicating with machines through speech. These systems normally enable users 
to interact with them and to perform a certain task. For example the CMU 
Communicator system is aimed at helping a user to create a travel itinerary consisting 
of air transportation, hotel reservations and car rentals [13], ARISE allows the users to 
consult the train timetable [2], TRINDI enables the users to make choices in the 
performance of the route planning [14], etc. The dialogue in these systems in 
particular, and more generally in the actual dialogue systems, contains just some 
exchanges between a user and the system. 

In the context of company’s voice portal PVE (Portail Vocal d’Entreprise) project 
[15], our analysis of use, which we carried out in hospitals, judicial and company 
offices, show that the voice service is naturally very useful for applications such as 
information requests, confirmation of a request, secretarial work such as transferring 
calls, scheduling appointments, reserving rooms... Spoken dialogue in these 
situations is normally short but contains complex utterances. However, users always 
require a complete service that is defined like a complete resolving problem in a face- 
to-face situation. For example, in the room reservation service, the spoken dialogue 
system must act as, behave as and take on the role of a virtual secretary. This means 
that the user is not only able to reserve a room, but also to request the confirmation of 
all participants and their availability. Moreover, the user should also be able to ask the 
system to negotiate with others in order to obtain a good compromise between them. 

Let see the following example: One user D would like to book the Lafayette room 
and he calls the system S. Unfortunately, this room is already taken by the person P. 
However, D has greater priority than P (may be due to hierarchical position), so he 

C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 266-274, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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asks the system to contact, to tell P to leave this room for him. The system then 
contacts P and fortunately reaches an agreement with him: he accepts to put back his 
meeting to the next day. Once the system has the response, it will recall D to inform 
him about the results. 

S 1 : Person D + System S 

D: hello, I am D, could you book me the room Lafayette for tomorrow at 9 o’ 
clock, please? 

S: I’m sorry Mr. D, this room is already taken by Mr. P. . . 

D: Tell him I need it and could he leave this room for me. 

S: OK, Til contact him and I’ll keep you up to date. 



S2: System S + Person P 
S: hello, are you Mr. P? 

P: yes, 

S: I’m contacting you about the Lafayette reservation. Could you leave this room for 
Mr. D, please? 

P: Let me see, . . OK, I’ll put back my meeting to tomorrow. 

S: That’s great, thank you very much. 



S3: System S + Person D 

S: Hello, Mr. D? 

D: Yes, it’s me 

S: Mr. P has already agreed to leave the Lafayette for you at 9 o’clock tomorrow. 
D: That’s very nice, thank you. 



Thus, we can see that the users require more functionality towards a dialogue 
system: the spoken dialogue system should now take the role of a mediator to 
negotiate with several users in order to resolve the conflict between them. There are 
possibly multiple users engaged in a dialogue. Therefore, we consider now the 
dialogue is expressed by multi-session with multiple users; each session is a dialogue 
between a single user and the system. In this paper, we introduce an approach for 
modeling the multi-session dialogue, and then the mechanisms to manage them in a 
spoken dialogue system. 



2 Basic Principles 

This section describes some important elements, which are used for our multi-session 
modeling. In relation to the architecture for a spoken dialogue system, we used the 
modular/multi-agent architecture described in [8] and as illustrated in figure 1. The 
multi-session management shown in the session 4 will be implemented in two 
modules: the “dialogue manager” and the “task manager”. 
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Utterance 

1 



Speech Recognition 

Orthographical string 




Task Manager 



Dialogue 



Actions 



Speech Synthesis 

F 



Utterance 



Fig. 1. Architecture for a Spoken Dialogue System 



2.1 Speech Act 

Austin [1] and Searle [11] consider all utterance as an act of communication called a 
speech act. A speech act might contain just one word, several words or a complete 
sentence. By combining with the notion of illocutionary logic, Vanderveken [12] 
defined the illocutionary force of a speech act. Then, as Caelen [3], it is useful to 
retain the following illocutionary forces in the human-machine dialogue domain: 



Table 1 . Illocutionary Forces of a Dialogue Act 



Act 


Signification 


F a 


Do or execute an action. 


F P 


Ask the hearer to perform an action. 


F s 


Communicate information in assertive way. 


F 


Ask for information. 


F P 


Give a choice, make an invite. 


F u 


Oblige to do without giving an alternative. 



Based on speech act theory and illocutionary logic, we define the notion of a 
dialogue act. A dialogue act is a speech act that is annotated by the illocutionary 
force. We represent a dialogue act as an illocutionary force that specifies what the 
speaker wishes to achieve, and a propositional content representing the semantic 
schema of statement. Each utterance can contain more than one dialogue act. For 
example, the utterance “Jean Caelen is calling... I would like to book a conference 
room” may be interpreted as follows: 

F s [FirstName(/ea7!)cfeLastName(' caelen j] & F F [Action[ toReserve ) <SRoomNamefrJ] 

A v . ► ^ ^ A '' 



Illocutionary force 



Propositional content (p) 



Concept 
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The user dialogue act is built naturally from user’s utterance by the Interpreter 
module and the dialogue manager has to generate the system dialogue act as his 
response to user’s utterance. 

2.2 Dialogue Goal 

A goal is generally a task state or a mental state that one wishes to reach (for example: 
to obtain information, to resolve a problem, etc.). The start of an exchange (a series of 
talking turns during which a goal is sustained) is initiated by the emergence of a new 
goal. Then this goal is transformed during the exchange and becomes a final goal (a 
task state or of a situation at the end of an exchange) at which point the exchange ends 
by a success or by a failure. The success obeys the double condition of being a 
reached goal and a satisfied goal. The final goal is not always predictable at the start. 

A dialogue goal is the goal that is sustained during an exchange. In the human- 
machine, dialogue it results from the type of considered task. For example, a room 
reservation implies a goal (of the task) as a request for a room and a dialogue goal that 
leads to a communication/negotiation to reach the goal. Thus, the dialogue goal can 
be satisfied while a general goal may not necessarily be satisfied [4]. A dialogue goal 
is represented as a logical predicat b and its possible states are shown in the following 
table: 



Table 2. Evolution of the State of Dialogue Goal 



Symbol 


Status 


Description 


?b 


new 


This goal has just been expressed by user. 


tb 


reached 


The predicate b becomes true. 


ib 


satisfied 


User manifests their agreement on fb, this 
agreement can be explicit or implicit. 


-b 


pending 


System solves temporarily another problem. 


b’ 


repaired 


Due to a lack of understanding, the goal is modified; 
user does not go back on his previous goal. 


sb 


sub-goal 


The problem is decomposed into sub-problems. 


@b 


abandoned 


This state is result of a failure or a voluntary abort. 



A dialogue goal is formed by the abstraction of dialogue act helped by the dialogue 
plan (which is specified in the task model by elementary goals, called task goals, and 
managed by the task manager). Once the dialogue manager has formed the dialogue 
goal, it sends this goal to the task manager to know if this goal is either reached, 
unable to reach, or missed information (states related to tasks). And then, the dialogue 
manager must decide itself if this dialogue goal is satisfied, pending or left [5], 

2.3 Dialogue Strategy 

The dialogue strategy 8 is the way to handle the talking turns between a user and the 
system to lead the final state (satisfied or abandoned) of the dialogue goal. The 
strategy aims at choosing the best adjustment direction of the goals at a given 
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moment. It is strongly a decisive factor in the dialogue efficiency, which is calculated 
by the speed of convergence of the dialogue acts towards the final goal. We 
distinguish the types of dialogue strategy by two different categories as following [5]: 

Non-inferential Strategies: the strategies that the system does not need to know 
initially the user’s goal: 

- Directive strategy: consists in keeping the initiative with the system to drive the 
dialogue, maintaining the exchange goal and, imposing a new goal. 

- Reactive strategy: is used to delegate the initiative to the user either by making him 
endorse the dialogue goal, or by adopting this goal. 

- Constructive strategy: consists in moving the current goal in order to invoke a 

detour, for example to make it notice an error, make a quotation, and undo an old 

fact... 

Inferential Strategies: These strategies are said to be inferential when both user and 
system need a perceptive knowledge of their respective goals. In these strategies, the 
two speakers have a shared initiative: 

- Cooperative strategy: consists in adopting the goal of the user by proposing one or 
many solutions which directly bring the best way to reach his goal. 

- Negotiated strategy: can be involved in a situation where the goals are 

incompatible and the both user and system want to minimize the concessions. The 
negotiation is expressed by argumentative sequences (argumentation/refutation) 
with the proposal for a sub-optimal solution until convergence or 

acknowledgement of failure. 



3 Multi-session Modeling 

Suppose now that a spoken dialogue system must perform a complete service having 
several tasks. We consider that a dialogue initiated by a user D for satisfying a goal 
related to this service is divided into a set of discontinuous sub-dialogues, each sub- 
dialogue representing a session Sk, including an opening phase, the different speech 
turns between the concerned user P k and the system S, and a closure phase. Therefore, 
the framework of dialogue is a sequence of sessions, the first one with the requester 
D, then and the next with the different addressees P k if necessary and at the end with 
D for the conclusion. 

3.1 Definitions 

We suppose during the first session Si, the user D interacts with the system S for 
resolving the goal b D . There are three possible cases at the closure of the session: 

1 . b D is satisfied (noted by $b D ), 

2. D chooses to abort his goal (noted by @b D ), 

3. b D cannot be reached because it is in conflict with others goals previously 
satisfied by others users of the service. 
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The third case leads new sessions to try to resolve b D : in a first step the system S 
put the goal b D in the pending state (noted -b D ) and then expands the different 
solutions to resolve the conflict and initiates a negotiation with the users which goals 
are in conflict with b D . 

We define: 

- Dialogue goal in conflict bf. the dialogue goal animated by the requester D is in 
conflict with the one already satisfied by the user P: b t = (-b D , $b p ) 

- Tree of dialogue goals in conflict : more generally the goal b D is possibly in 
conflict with the n satisfied goals of m other users, called for the next the 
‘ patients’’ , (Pi,...P m ) related by AND/OR operators. This set of conflict goals 
Tf=(bn, bf 2 , , bf m ) makes a AND-OR tree of dialogue goals in conflict with 
b D . Each leaf of this tree represents a goal in conflict from the patient P k . 




The resolution of the conflict for b D is to find a path from leafs to the root in 
respect to the AND/OR conditions along the tree. The resolution of one particular 
goal in conflict b tk in a leaf should be done by a special session issued of a dialogue S k 
with the user P k . 

3.2 Session Coordination 

Thus, the resolution of b D needs to manage dynamically several new sessions. The 
sessions sequence obeys the exploration of the AND-OR tree Tf. The goal b D will be 
satisfied if and only if there is at least a path in Tf having all satisfied elements. 

The algorithm to reach b D during sessions is shown as following: 



While b D is not reached Do, 

From the tree Tf Extract the best path unmarked [7] to reach b D and For all the 
leaf along this path. Open a negotiation dialogue with the concerned patient in 
order to solve the local conflict, 

In case o/breakdown Mark the path and Try again from the previous step 
In case of success Stop 
EndWhile 

Notify the result to D: b D is reached (noted by tb D ) or abandoned (@b D ). From here, 
D could of course accept or not this result and then the dialogue could continue 
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In the negotiation process, the system performs each session separately, but the 
order of handling these sessions depends on the best path founded at each step. 



4 Multi-session Management 

The multi-session management has to be done through both the dialogue manager and 
the task manager. The main idea here is how to manage efficiently the tree of 
dialogue goal in conflict. As our approach, the dialogue manager will control both the 
dialogue goal in a session and the tree of dialogue goals in conflict T f . In relation to 
the task manager, it will control the triggering, the development/execution of a 
session, and moreover, the coordination of the sessions sequence. 

4.1 At the Dialogue Manager Level 

In this section, we are only interested in the management of the goal in conflict (the 
management of a normal dialogue goal as well as the dialogue strategy were 
described in [9][6], and we do not mention them here). The task manager should 
compute and send the tree of dialogue goals in conflict T F to the dialogue manager. 
Once the dialogue manager receives the tree it manages the sessions and interacts 
with the task manager which acts as a problem-planner. 

During each negotiation, the goal hr, goes forward according to the attitudes of P, 
towards b fi . Its possible attitudes are: 

- give up bfi to D without conditions, 

- do not abandon b fi in all cases, 

- leave out b fi to D within conditions as modifying b fl , requesting a new goal b’ fi . 

In the two first cases, it seems that are not complicated as the third, which depends 
on new conditions of Pj, which can be: 

- feasible without the influence of others P, 

- not feasible, 

- feasible but it can lead to a new conflict with another P via a new session. 

These attitudes manifest directly to b t via the dialogue acts of user and are 
recognized by the dialogue manager. The negotiation process for b D finishes when it 
has been reached fb D , or all of possible negotiations have been failed and D has to 
abandon his goal @b D . 

4.2 At the Task Manager Level 

The task manager clearly takes an important role in relation to the multi-session 
management in a spoken dialogue system. For ensuring the coherence of multi- 
sessions, it should contain the planning of all possible sessions, manage the multi- 
session sequence, and supervise progress of the goal in conflict. 

During a session, the task manager must dynamically build T t in case of having 
conflicts. Once user requests to perform T t , the task manager will develop a plan to 
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negotiate with patients. For each patient P ; , the task manger will launch a session to 
resolve b fi , and once the dialogue manager has the b D state that has already reached or 
abandoned. 



5 Example 

For modeling the multi-session in a spoken dialogue system, we used the room’s 
reservation service via telephone as a case study. Let us use the above example in 
section 1 to illustrate our approach: 

In SI, the requester D manifests directly a new goal ?b D = person(D) a room(R) a 
date(DT) a toReserve(D,R,DT). However, the room requested by D was already 
reserved by P as $b p = person(P) a room(R) a date(DT) a toReserve(P.R.DT). 

By interacting with the task manager, the dialogue manager determines a room and 
date conflict represented by b f = (b D , $b p ). The task manager then creates T f = { b t } 
and the dialogue manager plans a new session to negotiate with P. 

In the next step, the task manager interacts with the dialogue manager to launch a 
new session S 2 for resolving bf. The system S calls P and suppose the negotiation in 
this case happens successfully: P accepts for moving his meeting to the next day so 
the goal in conflict has been resolved, because $b p becomes :i:b’ P = person(P) a 
room(R) a date(DT+l) a toReserve(P,R,DT+l). The task manager should 
acknowledge these new situations and plans making a new session to inform D about 
the results. 

The third session S3 is just to notify to D the state of Tf, a reached goal now. 
Naturally, D could also deny bf by such reasons, but fortunately, he recognizes bf and 
manifests it to be satisfied. So the dialogue animated by D has been completed. 



6 Results and Conclusion 

Multi-session management in a discontinuous human-machine dialogue has become 
necessary in increasing the capability of the spoken dialogue system. Based on the 
dialogue management which is reduced as much as possible the dependence on task 
model, we have built a prototype of such a system dedicated for the reservation 
service aimed in the PVE project (by French language). Our prototype could currently 
manipulate the sessions like the room reservation, meeting convocation, and 
moreover, the cancellation/modification of a reservation. By applying our 
methodology of multi-session modeling and management, our prototype can now act 
like a real mediator: users could ask the system to negotiate with another user in case 
having conflicts of room, date. 

The experimentations, which have carried out with our prototype with the corpus 
collected during the Wizard of Oz step in the PVE project, prove the validity of our 
theory for the multi-session management. We have also done a lot of tests within 
multi-session for resolving the room/date conflict, and we will publish the official 
result evaluation later In the near future, with the speech-recognized improvement, the 
robust comprehension/interpretation, our system will be totally completed with the 
best negotiation capability. 
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The first results we have obtained and are obtaining not only show the importance 
of multi-session management in a spoken dialogue system, but also open a new 
direction in the way of bringing intelligence and speech to machines. 
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Abstract. In this paper, a new explanatory and high-level approach to 
knowledge discovery from texts is described which uses natural language 
techniques and and evolutionary computation based optimization to find 
novel patterns in textual information. In addition, some results showing 
the promise of the approach towards effective text mining when compared 
to human performance are briefly discussed. 



1 Introduction 

Knowledge Discovery from Texts (KDT) or Text Mining, is an emerging technol- 
ogy for analysing large collections of unstructured documents for the purposes 
of extracting interesting and novel knowledge [8] , so it can be seen as a leap from 
Data Mining and Knowledge Discovery from Databases (KDD). 

However, Data Mining techniques cannot be immediately applied to text data 
for the purposes of TM as they assume a structure in the source data which is not 
present in free text. In addition, while the assessment of discovered knowledge 
in the context of KDD is a key aspect for producing an effective outcome, the 
assessment of the patterns discovered from text has been a neglected topic in 
the majority of the text mining approaches and applications. 

One of the main features of most sophisticated approaches to text mining 
which use high-level representation (i.e. , not just keywords) is an intensive use 
of electronic linguistic resources including ontologies, thesauri, etc., which highly 
restricts the application of the unseen patterns to be discovered, and their do- 
main independence. 

In this context, using evolutionary computation techniques (i.e., Genetic Al- 
gorithms) for mining purposes [5] has several promising advantages over the 
usual analysis methods employed in KDT: the ability to perform global search, 
the exploration of solutions in parallel, the robustness to cope with noisy and 



* This research is sponsored by the National Council for Scientific and Technological 
Research (FONDECYT, Chile) under grant number 1040469 u Un Modelo Evolu- 
cionario de Descubrimiento de Conocimiento Explicativo desde Textos con Base 
Semantica con Implicaciones para el Analisis de Inteligencia.'” 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 275—285, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




276 J. Atkinson-Abutridy 



missing data, and the ability to assess the goodness of the solutions as they are 
produced in terms of the problem domain. 

Accordingly, in this paper we discuss a new semantically-guided model for 
KDT which brings together the benefits of shallow text processing and 
multi-objective evolutionary computation to produce novel and interesting 
knowledge. 



2 Text Mining and Knowledge Discovery from Texts 

In performing typical text mining tasks, many researchers worldwide have as- 
sumed a typical Bag-of- Words (BOW) representation for text documents which 
makes it easy to analyse them but restrict the kind of discovered knowledge. Fur- 
thermore, the discoveries rely on patterns in the form of numerical associations 
between terms from the documents which fail to provide explanations of, for 
example, why these terms show a strong connection. In consequence, no deeper 
knowledge or evaluation of the discovered knowledge are considered. 

Many recent KDT and text mining applications show a tendency to start 
using more structured or deeper representations than just keywords (or terms) 
to perform further analysis so to discover unseen patterns. Early research on 
this kind of view can be due to seminal work by Swanson [10] on exploratory 
analysis from the titles of articles stored in the MEDLINE medical database. 
Swanson designed a system to infer key information by using simple patterns 
which recognize causal inferences such as "X cause Y" and more complex 
implications, which lead to the discovery of hidden and previously neglected 
connections between concepts. This work provided evidence that it is possible 
to derive new patterns from a combination of text fragments plus the explorer’s 
medical expertise. 

Further approaches have exploited these ideas by combining more 
elaborated Information Extraction (IE) patterns and general lexical resources 
(e.g., WordNet) [6] or specific concept resources. They deal with automatic 
discovery of new lexicosemantic relations by searching for corresponding defined 
patterns in unrestricted text collections so as to extend the structure of the given 
ontology. 

A different view in which linguistic resources such as WordNet are used to 
assist the discovery and to evaluate the unseen patterns is followed by Mooney 
and colleagues [1] who propose a system to extract basic information (i.e., rules 
containing attribute-value pairs) from general documents by using IE extrac- 
tion patterns. Furthermore, human subject assess the real interestingness of the 
most relevant patterns mined by the system. The WordNet approach to evalu- 
ation has proved to be well correlated with human judgments, and unlike the 
other methods, Mooney’s is the only one which deals with the whole process of 
knowledge discovery: mining and patterns evaluation. However, the dependence 
on the existence of an linguistic resource prevents the method from dealing with 
specific terminology leading to missing and/or misleading information. 
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3 A LSA-Guided Model for Effective Text Mining 

We developed a semantically-guided model for evolutionary Text Mining which is 
domain-independent but genre-based. Unlike previous approaches to KDT, our 
approach does not rely on external resources or descriptions hence its domain- 
independence. Instead, it performs the discovery only using information from 
the original corpus of text documents and from the training data generated from 
them. In addition, a number of strategies have been developed for automatically 
evaluating the quality of the hypotheses. 

We have adopted Genetic Algorithms (GAs) as central to our approach to 
KDT. However, for proper GA-based KDT there are important issues to be 
addressed including representation and guided operations to ensure that the 
produced offspring are semantically coherent. 

In order to deal with these issues so to produce an effective text mining 
process, our working model has been divided into two phases. The first phase is 
the preprocessing step aimed to produce both training information for further 
evaluation and the initial population of the GA. The second phase constitutes the 
knowledge discovery itself, in particular this aims at producing and evaluating 
explanatory unseen hypotheses. 

The whole processing starts by performing the IE task which applies extrac- 
tion patterns and then generates a rule-like representation for each document 
of the specific domain corpus. Once generated, these rules, along with other 
training data, become the “model” which will guide the GA-based discovery. 

In order to generate an initial set of hypotheses, an initial population is cre- 
ated by building random hypotheses from the initial rules, that is, hypotheses 
containing predicate and rhetorical information from the rules are constructed. 
The GA then runs for a number of generations until a fixed number of genera- 
tions is achieved. At the end, a small set of the best hypotheses (i.e., discovered 
patterns) are obtained. 

The description of the approach is organized as follows: section 3.1 presents 
the main features of the text preprocessing phase and how the representation 
for the hypotheses is generated. In addition, training tasks which generate the 
initial knowledge (semantic and rhetorical information) to feed the discovery 
are described. Section 3.2 describes constrained genetic operations to enable the 
hypotheses discovery, and proposes different evaluation metrics to assess the 
plausibility of the discovered hypotheses in a multi-objective context. 

3.1 Text Preprocessing and Representation 

The preprocessing phase has two main goals: to extract important information 
from the texts and to use that information to generate both training data and 
the initial population for the GA. 

It is well-known that processing full documents has inherent complexities [9] , 
so we have restricted our scope somewhat to consider a scientific genre involving 
scientific/technical abstracts. From this kind of abstract’s structure, important 
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constituents can be identified such as Rhetorical Roles, Predicate Relations, and 
Causal Relation(s). 

In order to extract this initial key information from the texts, an IE module 
was built. Essentially, it takes a set of text documents, has them tagged through 
a previously trained Part-of-Speech (POS) tagger, and produces an intermediate 
representation for every document which is then converted into a general rule. 

In addition, key training data are captured from the corpus of documents 
itself and from the semantic information contained in the rules. This can guide 
the discovery process in making further similarity judgments and assessing the 
plausibility of the produced hypotheses. 

Following work by [7] on Latent Semantic Analysis (LSA) incorporating struc- 
ture, we have designed a semi-structured LSA representation for text data in 
which we represent predicate information and arguments separately once they 
have been properly extracted in the IE phase. 

We also included basic knowledge at a rhetorical, semantic level, and co- 
occurrence information which can be effectively computed to feed and guide 
the evolutionary discovery process. Accordingly, we perform two kinds of tasks: 
creating the initial population and computing training information from the rules 
(i.e., correlations between rhetorical roles and predicate relations, co-occurrences 
of rhetorical information, etc). 

3.2 Evolutionary Text Mining and Patterns Evaluation 

Our model for evolutionary Text Mining is strongly guided by semantic and 
rhetorical information, and consequently there are some soft constraints to be 
met before producing the offspring so as to keep them coherent. 

The discovery process itself (i.e., multi-objective Genetic Algorithm) will 
start from a initial population, which in this case, is a set of semi-random hy- 
potheses built up from the preprocessing phase. Next, constrained GA opera- 
tions are applied and the hypotheses are evaluated. In order for every individual 
to have a fitness assigned, we use a evolutionary multi-objective optimization 
strategy based on the SPEA algorithm [11] in a way which allows incremental 
construction of a Pareto-optimal set and uses a steady-state strategy for the 
population update. 

Pattern Discovery. Using the semantic measure previously highlighted and 
additional constraints discussed later on, we propose new operations to allow 
guided discovery such that unrelated new text knowledge is avoided, as follows: 

— Selection: selects a small number of the best parent hypotheses of every 
generation ( Generation Gap) according to their Pareto-based fitness. 

— Crossover: a simple recombination of both hypotheses’ conditions and con- 
clusions takes place, where two individuals swap their conditions to produce 
new offspring (the conclusions remain). 

Under normal circumstances, crossover works on random parents and po- 
sitions where their parts should be exchanged. However, in our case this 
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operation must be restricted to preserve semantic coherence. We use soft 
semantic constraints to define two kind of recombinations: 

1. Swanson’s Crossover: based on Swanson’s hypothesis [10] we propose a 
recombination operation as follows: 

If there is a hypothesis (AB) such that “IF A THEN B” and another 
one (BC) such that “IF B ’ THEN C”, ( B ’ being something semantically 
similar to B) then a new interesting hypothesis “IF A THEN C” can 
be inferred, only if the conclusions of AB have high semantic similarity 
(i.e., via LSA) with the conditions of hypothesis BC. 

2. Default Semantic Crossover: if the previous transitivity does not apply 
then the recombination is performed as long as both hypotheses as a 
whole have high semantic similarity which is defined in advance by pro- 
viding minimum thresholds. 

— Mutation: aims to make small random changes on hypotheses to explore 
new possibilities in the search space. This is performed in a semantically- 
constrained way at roles, arguments, and predicate levels. 

— Population Update: we use a non-generational GA in which some individuals 
are replaced by the new offspring in order to preserve the hypotheses’ good 
material from one generation to other, and so to encourage the improvement 
of the population’s quality. We use a steady-state strategy in which each 
individual from a small number of the worst hypotheses is replaced by an 
individual from the offspring only if the latter are better than the former. 

Automatic Evaluation of Discovered Patterns. Since each pattern 
(hypothesis) discovered by the model has to be assessed by different criteria, 
usual methods for evaluating fitness are not appropriate. Hence Evolutionary 
Multi- Objective Optimization (EMOO) techniques which use the multiple crite- 
ria defined for the hypotheses are needed. For this, we propose EMOO-based 
evaluation metrics to assess the hypotheses’ fitness in a domain-independent 
way. 

In order to establish evaluation criteria, we have taken into account different 
issues concerning plausibility, and quality itself. Accordingly, we have defined 
eight evaluation criteria to assess the hypotheses given by: relevance, struc- 
ture, cohesion, interestingness, coherence, coverage, simplicity, plau- 
sibility of origin. 

Evaluation criteria by which the hypotheses are assessed and the questions 
they are trying to address are as follows: 

— Relevance ( How important is the hypothesis to the target question?): mea- 
sures the semantic closeness between the hypothesis’ predicates (relations 
and arguments) and the target concepts. Relevance is then computed from 
compound vectors obtained in the LSA analysis which follows work by 
Kintsclr on Predication [7]. We then propose an adaptation of the LSA-based 
closeness so to compute the overall relevance of the hypothesis in terms of 
the “strength” which determines how closely related two concepts are to 
both some predicate and its arguments. 
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— Structure ( How good is the structure of the rhetorical roles?): measures 
how much of the rules’ structure is exhibited in the current hypothesis. Since 
we have previous preprocessing information regarding bi-grams of roles, the 
structure is computed by following a Markov chain of the “bi-grams” of the 
rhetorical information of each hypothesis. 

— Cohesion (How likely is a predicate action to he associated with some specific 
rhetorical role?): measures the degree of “connection” between rhetorical 
information and predicate actions. The issue here is how likely some predicate 
relation in the current hypothesis is to be associated with some rhetorical 
role. 

— Interestingness ( How interesting is the hypothesis in terms of its antecedent 
and consequent?): Unlike other approaches to measure “interestingness” which 
use an external resource and rely on its organisation we propose a different 
view where the criterion can be evaluated from the semi-structured informa- 
tion provided by the LSA analysis. The measure for a hypothesis is defined as 
a degree of semantic dissimilarity (unexpectedness) between its antecedent 
and its consequent. 

— Coherence: This metrics addresses the question whether the elements of 
the current hypothesis relate to each other in a semantically coherent way 
( text coherence [3, 4]). We developed a simple method to measure coherence, 
following work by [4] on measuring text coherence. Semantic coherence is cal- 
culated by considering the average semantic similarity between consecutive 
elements of the hypothesis. 

— Coverage: The coverage metric tries to address the question of how much 
the hypothesis is supported by the model (i.e., rules representing documents 
and semantic information). 

In order to deal with the criterion in the context of text mining, we say that 
a hypothesis covers an extracted rule only if the predicates of this hypothesis 
are roughly (or exactly, in the best case) contained in that rule. 

— Simplicity ( How simple is the hypothesis?): shorter and/or easy-to-interpret 
hypotheses are preferred. Since the criterion has to be maximized, the eval- 
uation will depend on the length (number of elements) of the hypothesis. 

— Plausibility of Origin ( How plausible is the hypothesis produced by Swan- 
son’s evidence?): If the current hypothesis was an offspring from parents 
which were recombined by a Swanson’s transitivity-like operator, then the 
higher the semantic similarity between one parent’s consequent and the other 
parent’s antecedent, the more precise is the evidence, and consequently worth 
exploring as a novel hypothesis. 

Note that since we are dealing with a multi-objective problem, there is no 
simple way to get independent fitness values as the fitness involves a set of ob- 
jective functions to be assessed for every individual. Therefore the computation 
is performed by comparing objectives of one individual with others in terms of 
Pareto dominance [2] in which non-dominated solutions are searched for in every 
generation. 

We took a simple approach in which an approximation to the Pareto optimal 
set is incrementally built as the GA goes on. The basic idea is to determine 
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whether a solution is better than other in global terms, that is, a child is better 
if this is a candidate to become part of the Pareto front. 

Next, since our model is based on a multi-criteria approach, we have to face 
three important issues in order to assess every hypothesis’ fitness: Pareto dom- 
inance, fitness assignment and the diversity problem [2], Despite an important 
number of state-of-the-art methods to handle these issues [2], only a small num- 
ber of them has focused on the problem in an integrated and representation- 
independent way. In particular, Zitzler [11] proposes an interesting method, 
Strength Pareto Evolutionary Algorithm (SPEA) which uses a mixture of es- 
tablished methods and new techniques in order to find multiple Pareto-optimal 
solutions in parallel, and at the same time to keep the population as diverse 
as possible. We have also adapted the original SPEA algorithm which uses an 
elitist strategy to allow for the incremental updating of the Pareto-optimal set 
along with our steady-state replacement method. 

4 Experiments and Results 

In order to assess the quality of the discovered knowledge (hypotheses) by the 
model, a computer prototype has been built. For the purpose of the experiments, 
the corpus of input documents has been obtained from agricultural articles. Next, 
a set of 1000 documents was extracted from which one third were used for setting 
parameters and making general adjustments, and the rest were used for the GA 
itself in the evaluation stage. 

Next, we tried to provide answers to two basic questions concerning our 
original aims: How well does the GA for text mining behave? and How good are 
the hypotheses produced according to human experts in terms of text mining’s 
ultimate goals: interestingness, novelty and usefulness, etc. 

In order to address these issues, we used a methodology consisting of two 
phases: the system evaluation and the experts’ assessment. 

1. System Evaluation: this aims at investigating the behavior and the results 
produced by the GA. 

We set the GA by generating an initial population of 100 semi-random 
hypotheses. In addition, we defined the main global parameters such as Mu- 
tation Probability (0.2), Crossover Probability (0.8), Maximum Size of Pareto 
set (5%), etc. We ran five versions of the GA with the same configuration of 
parameters but different pairs of terms to address the quest for explanatory 
novel hypotheses. 

The different results obtained from running the GA as used for our exper- 
iment are shown in the form of a representative behavior in figure 1, where 
the number of generations is placed against the average objective value for 
some of the eight criteria. 

Some interesting facts can be noted. Almost all the criteria seem to sta- 
bilize after generation 700 for all the runs, that is, no further improvement 
beyond this point is achieved and so this may give us an approximate indi- 
cation of the limits of the objective function values. 
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COHERENCE SWANSON PLAUSIBILITY 




Fig. 1. Evolutionary Evaluation for some of the criteria 



Note also that not all the criteria move in the same direction. Look at 
the results for the criteria for the same period of time, between generations 
200 and 300 for run 4. For an average hypothesis, the quality of Coher- 
ence, Cohesion , Simplicity and Structure gets worse, whereas this improves 
for Coverage, Interestingness and Relevance, and has some variations for 
(Swanson) Plausibility. 

2. Expert Assessment: this aims at assessing the quality (and therefore, effec- 
tiveness) of the discovered knowledge on different criteria by human domain 
experts. For this, we designed an experiment in which 20 human experts were 
involved and each assessed 5 hypotheses selected from the Pareto set. We 
then asked the experts to assess the hypotheses from 1 (worst) to 5 (best) 
in terms of the following criteria: Interestingness (INT), Novelty (NOV), 
Usefulness (USE), Sensibleness (SEN), etc. 

In order to select worthwhile terms for the experiment, we asked one domain 
expert to filter pairs of target terms previously related according to traditional 
clustering analysis. The pairs which finally deserved attention were used as input 
in the actual experiments (i.e., glycocide and inhibitors). 

Once the system hypotheses were produced, the experts were asked to score 
them according to the five subjective criteria. Next, we calculated the scores for 
every criterion as seen in the overall results in figure 2. 




Fig. 2. Human Experts’ Assessment of Discovered Patterns 
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The assessment of individual criteria shows some hypotheses did well with 
scores above the average on a 1-5 scale. This is the case for hypotheses 11, 16 
and 19 in terms of INT, hypotheses 14 and 19 in terms of SEN, hypotheses 1, 5, 
11, 17 and 19 in terms of USE, and hypotheses 24 in terms of NOV, etc. 

Note also that the assessment seems to be consistent for individual hypotheses 
across the criteria: hypothesis 19 is well above the average for almost all the 
criteria (except for NOV), hypothesis 18 always received a score below 2 (25%) 
except for ADD in which this is slightly higher. Similar situations can be observed 
for hypotheses 2, 21, etc. 

From this automatic evaluation for the discovered patterns, we measured the 
correlation between the scores of the human subjects and the system’s model 
evaluation. Since both the expert and the system’s model evaluated the results 
considering several criteria, we first performed a normalization aimed at produc- 
ing a single “quality” value for each hypothesis. 

We then calculated the pair of values for every hypothesis and obtained a 
(Spearman) correlation r = 0.43 ( t — test = 23.75, df = 24, p < 0.001). From this 
result, we see that the correlation shows a good level of prediction compared to 
humans. This indicates that for such a complex task (knowledge discovery), the 
model’s behavior is not too different from the experts’. Note that unlike other 
approaches, our model was able to do it better without any external linguistic 
resources. 

In order to show what the final hypotheses look like and how the good char- 
acteristics and less desirable features as above are exhibited, we picked one of 
the best hypotheses as assessed by the experts considering the average value of 
the 5 scores assigned by the user. For example, hypothesis 88 of run 3 looks like: 

IF goal (show (11511) ) and method (use (25511) ) THEN effect (1931 , 1932) 

Where the numerical values represent internal identifiers for the arguments 
and their semantic vectors, and its resulting criteria vector is 
[0.29,0.18,0.41,0.030,0.28,0.99,0.30,0.50] (the vector’s elements represent the 
values for the criteria relevance, structure, coherence, cohesion, interestingness, 
plausibility, coverage, and simplicity) and obtained an average expert’s assess- 
ment of 3.20. In natural-language text, this can roughly be interpreted as: 

IF the goal is to show that the forest restoration . . 

AND the method is based on the use of micro-environments for 
capturing farm mice.. 

THEN digestibity "in vitro" should have an effect on the bigalta 
cuttings . . 

This hypothesis looks more complete than others (goal, methods, etc) but is 
less relevant than the previous hypothesis despite its close coherence. Note also 
that the plausibility is much higher than for hypothesis 65, but the other criteria 
seemed to be a key factor for the human experts. 
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5 Conclusions 

In this paper, a unique approach for evaluation is introduced which deals with 
semantic and Data Mining issues in a high-level way. In this context, the pro- 
posed representation for hypotheses suggests that performing shallow analysis 
of the documents and then capturing key rhetorical information may be a good 
level of processing which constitutes a trade off between completely deep and 
keyword-based analysis of text documents. In addition, the results suggest that 
the performance of the model in terms of the correlation with human judgments 
are slightly better than approaches using external resources as in [1], In par- 
ticular criteria, the model shows a very good correlation between the system 
evaluation and the expert assessment of the hypotheses. 

In addition, unlike traditional approaches to Text Mining, in this paper we 
contribute an innovative way of combining additional linguistic information and 
evolutionary learning techniques in order to produce novel hypotheses which 
involve explanatory and effective novel knowledge. 

The model deals with the hypothesis production and evaluation in a very 
promising way which is shown in the overall results obtained from the experts 
evaluation and the individual scores for each hypothesis. However, it is important 
to note that unlike human experts who have a lot of experience, preconceived 
concept models and complex knowledge in their areas, the system has done 
relatively well only exploring the corpus of technical documents and the implicit 
connections contained in it. 
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Abstract. The ultimate goal of the poetry assistant currently under development 
in our lab is an application to be used either as a poetry game or as a teaching tool 
for both poetry and grammar, including the complex relationships between sound 
and meaning. Until now we focused on the automatic classification of poems and 
the suggestion of the ending word for a verse. The classification module is based 
on poetic concepts that take into account structure and metrics. The prediction 
module uses several criteria to select the ending word: the structural constraints 
of the poem, the grammatical category of the words, and the statistical language 
models obtained from a text corpus. 

The first version of the system, rather than being self-contained, is still based 
on the use of different heterogeneous modules. We are currently working on a 
second version based on a modular architecture that facilitates the reuse of the 
linguistic processing modules already developed within the lab. 



1 Introduction 

This paper describes the early stages of the development of a system to assist poetry 
writing. Its main goals are the classification of poems and the suggestion of verses’ 
ending words. The poem classification is based on poetic concepts, such as the number 
of verses in the stanzas; the stanzaic form; the number of syllables on the verses; and the 
rhyme scheme. The suggestion of the ending word of a verse may be based on different 
selection criteria or combination of criteria: the structural constraints of the poem, the 
grammatical category of the words, and the statistical language models obtained from a 
text corpus. 

With such a system, we intend on one hand to help understanding the structure and 
rhyme of poems, which may be important to improve the reading aloud capabilities of 
students and, on the other hand, to encourage them to write their own poems. 

Although there are some Internet sites that publish and discuss poetry for Portuguese 
[1,2], there are no computer programs to assist the process of writing poems for our 
language and we could not find any framework that handles rhyme correctly. A rhyme 
dictionary [3] exist, but it only takes into account the final letters of the words, not their 
pronunciation. 
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For the English language, we could find some interesting word games where the user 
can make poems with those words [4], programs that allow the generation of nonsense 
poems based on syntax definitions [5], and others that generate poetry based on models 
created from poetry corpora [6]. 

One of the interesting features of our application is the fact that it relies on tools 
which, with few exceptions, have already been implemented in the lab. Hence, the first 
version of the system, rather than being self-contained, is still based on the use of 
different heterogeneous modules. We are currently working on a second version based 
on a modular architecture that facilitates the reuse of the already existent linguistic 
processing modules. 

We shall start the description of our poetry assistant system with the classification 
module (Section 2), focusing on the submodule that presents some important research 
challenges - the metric syllable counter. Section 3 describes the prediction module, and 
the selection criteria that can be combined to suggest the missing words. Finally, Section 
4 describes the new system architecture we are currently working on. 

The two following sections report informal evaluation experiments conducted using 
a very small test corpus that was manually classified. In addition to around 20 stanzas 
from two very well known Portuguese poets, the corpus also includes around 200 poems 
written by children (ages 7-9). The corpus collection task is by no means complete 
yet. In fact, it has recently been enlarged with the poems collected in the scope of a 
national project dealing with digital spoken books [7]. The informal tests described 
below, however, do not yet include this recent addition. 

2 Classification Module 

This module currently takes as input a finished poem, and yields as output the number 
of lines and stanzas, the stanzaic form, and the rhyme scheme. 

One of the main tools used by this module is a Grapheme-to-Phone conversion 
tool. Several approaches have been developed in the Lab for this purpose: based on 
rules, neural networks [8], and classification and regression trees [9]. The current GtoP 
module of the DIXI+ system is based on CARTs trained on a lexicon and has a word 
error rate of 3.8%, slightly higher than the rules system. Our latest efforts in terms of 
GtoP conversion have expressed the original rules as FSTs [10]. 

The GtoP tool yields the phonetic transcription of each verse, taking into account 
generic word co-articulation rules. It outputs a string of symbols of the S AMPA phonetic 
alphabet for European Portuguese [1 1], as well as lexical stress markers. 

The following is an example of the output of the classification module, using the 
first stanza of the most famous epic poem in Portuguese (Os Lus’adas, by Camoes, XVI 
century): 

As armas e os bardes assinalados 
Que da ocidental praia lusitana 
Por mares nunca dantes navegados 
Passaram ainda alem da Taprobana, 

Em perigos e guerras esforgados 
Mais do que prometia aforga Humana, 
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E entre gente remota edificaram 
Novo Reino, que tanto sublimaram; 

Summary : 
lines: 8 
verses: 8 
stanzas: 1 

rhyme: [A, B, A, B, A, B, C, C] 

syllables: [10,10,10,10,10,10,10,10] 

Classification: ' 'Ottava rima' ' 

Although the relevance of the phonetic transcription tool for the rhyme classification 
module is not well illustrated by this example (where the same conclusions over rhyme 
could be derived from the orthography alone), this is not always the case. The above 
illustrated rhyming criterion requires perfect rhymes, where the last stressed vowel and 
succeeding sounds are identical. An alternative criterion could be rhyming only in terms 
of the two last syllabic nuclei (e.g. dentro and vento, silencio and cinzento). 

2.1 Metric Structure 

One of the main research challenges of this project is dealing with the metric structure 
of the poems. In fact, counting metric syllables (up to the last stressed syllable) is not 
an easy task, if one takes into account word co-articulation phenomena and different 
possibilities of phonological reduction. 

In the current version of the classification module, syllabic boundary markers are 
placed a posteriori on the SAMPA string using a simple script based on generic rules. 
As an example of its application to the sixth verse in the above poem, we obtain the 
correct number of 10 metric syllables (syllable boundaries are marked by “$” or by ", 
for stressed syllables), accounting for vowel coalescence and diphthonguisation (7th and 
9th syllables, respectively). 

"ma j Z$du$k@$pru$m@ " ti$a" f or$s6w"m6$n6 

We are currently working on another version which allows for some phonological 
variation not accounted for by the GtoP rules, yielding for each verse a range of values, 
rather than a single value, for the number of metric syllables. This work is closely 
related with our past work on pronunciation variation for phone alignment [12] where 
we have modeled variants that depend on the local immediate segmental context through 
rules implemented via finite state transducers. The main phonological aspects that the 
rules are intended to cover are: vowel devoicing, deletion and coalescence, voicing 
assimilation, and simplification of consonantal clusters, both within words and across 
word boundaries. Some common contractions are also accounted for, with both partial 
or full syllable truncation and vowel coalescence. Vowel reduction, including quality 
change, devoicing and deletion, is specially important for European Portuguese, In fact, 
as a result of vowel deletion, rather complex consonant clusters can be formed across 
word boundaries. 

Notice that vowel deletion can be deliberately marked in the orthography by the 
authors themselves (e.g. jd como am ‘stigma by David Mourao Ferreira in “Dos anos 
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Trinta”, in which the first vowel “e”, pronounced as a schwa and often deleted, is not 
included in the orthography). 

2.2 Meter 

The identification of the stressed syllables is also important for classifying the meter - 
rising (iambs and anapests), or falling (trochees and dactyls), depending on having one 
or two unstressed syllables followed by a stressed one or a stressed syllable followed by 
one or two unstressed syllables. It is also important for classifying each verse according 
to the position of the stressed syllable of the last rhyming word (oxytone words have the 
accent in the last syllable; paroxytone ones have the accent in the penultimate syllable; 
and proparoxytone words have the accent in the antepenult syllable). Proparoxytone 
verses are much less frequent. 

The meter classification part is not yet implemented. 

2.3 Informal Evaluation 

An informal evaluation of the classification module was done using our still small test 
corpus. The results obtained with the already implemented modules are very positive. 

A byproduct of this work was the development of the first electronic Portuguese 
rhyme dictionary, in which the search for rhyming words is made based on the word’s 
phonetic transcription. 

3 Word Prediction Module 

This module takes as input an incomplete poem, for which the user may request the 
suggestion of a word that rhymes with the other verses of the stanza. The suggestion 
may be based on different selection criteria or combination of criteria, previously defined 
by the user. The selection criteria should include: 

- words rhyming with the last verse 

- words rhyming with a given verse 

- metrical atructure of the verse 

- words belonging to a given grammatical category (noun, adjective, verb, ...) 

- words belonging to a given ontology (animal, tree, flower, job, country, ...) 

- words that may follow the last word written by the user, according to some statistical 
language model 

The list of words that satisfy all the criteria defined by the user may be too large to 
allow its presentation to the user. Hence the module selects the 10 best words in this list. 

The prediction module uses the same GtoP and metric syllable counting tools as 
the classification module. In addition, it uses some linguistic analysis modules already 
available in the lab, for the generation of possible word classes that may follow the last 
word of the incomplete poem: 

- Smorph: Morphological Analyser [13] 

- PAsMo: Post-Morphological Analysis [14] 

- SuSAna: Surface Syntactic Analyser [15] 
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The submodule related to the ontology criterion is not yet implemented. The sub- 
module that involves statistical language models uses n-gram models built using the 
CMU-Cambridge Statistical Language Modeling toolkit [16] which provides support 
for n-grams of arbitrary size, and support for several discounting schemes. Rather than 
retraining new models with our restricted poetry corpus, we used the existent n-grams 
currently built for a broadcast news task (57k unigrams, 6M bigrams, 1 1M trigrams, ...). 

As an example of the application of the word prediction module, consider the fol- 
lowing poem by Antonio Aleixo: 

A quem prende a dgua que corre 

e por si proprio enganado; 

O ribeirinho ndo morre, 

vai correr por outro lado. 

Let us suppose that the last word (lado) was not included. By using a bigram model 
criterion, the first 10 words suggested would be (by decreasing order of frequency): 

lado, dos, a , de, que, o, para, e, dia, em, ... 

On the other hand, by using a rhyme criterion, the first 10 suggested words would 
be: 

lado, passado, resultado, dado, deputado, avanado, 

machado, demasiado, obrigado, advogado, ... 

The application of the number of metric syllables criterion (7 in this case, for the 
other verses), together with the above criteria, restricts the list of 2-syllable suggested 
words to: 

lado, dado, fado 

The missing word was thus successfully suggested by the prediction module. 

3.1 Informal Evaluation 

An informal evaluation of this module was conducted using our still very small poetry 
corpus as test corpus and repeating the above procedure of removing the last word of 
each stanza. Our preliminary results led us to conclude that n-grams of higher order 
(3-grams, 4-grams, ...) are not very effective, as the missing word is not always among 
the first 10 most frequent n-grams, and in some cases it is not present at all. 

The effectiveness of the use of structural constraints was also compared with the use 
of the bigram models. Our first tests intended to compare the bigram criterion with the 
one using the number of metric syllables. Our preliminary results led us to conclude that 
in the 10 most frequent words produced by the second criterion, there are more words 
that could be used as a substitute of the missing word than in the corresponding list 
produced by the bigram criterion. Similar preliminary conclusions could be drawn when 
comparing the use of the bigram criterion with the rhyme one: the latter yielded a larger 
number of words that could be used as a replacement of the missing word. The last bunch 
of tests used a combination of the two structural constraints: number of syllables and 
rhyme. In this case, most of the words in the 10-best list could be used as a replacement. 
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4 System Architecture 

As stated above, one of the interesting features of our poetry assistant is the reuse 
of several already available linguistic processing modules. Hence it was particularly 
important to choose a suitable architecture. We wanted it to be domain independent, 
language independent, distributed (over the internet) and normalized over each type 
of functional module. Moreover, we would like to have the possibility of aggregating 
existing modules to obtain a “new” module with a “new” interface. 

These design goals led to the development of GallnHa [17], a web-based user inter- 
face for building modular applications that enables users to access and compose modules 
using a web browser. GallnHa is the short name for Galaxy Interface Handler, inspired as 
the name indicates on GALAXI II [ 19], the architecture defined at MIT and used by the 
DARPA Communicator initiative. GALAXI II is a distributed system with a client-server 
architecture. It concentrates the communication between modules in a single module, 
the hub, using a standardized protocol. 

Since GallnHa deals with Galaxy process chains, all that is needed for the devel- 
opment of the poetry assistant is to define the corresponding chain and to connect the 
modules. Some of the needed linguistic analysis modules are already available through 
GallnHa (E.g. Smorph, PAsMo, SuSAna), although they accept/produce different data 
formats, so additional modules were needed to provide data format conversion. 

All that is needed to connect the new modules is to include an existing XSLT [20] 
processor into GallnHa. For that, we have to write a wrapper to call external applications. 
The other modules are currently being adapted to the GallnHa architecture. 

Gal inha’s Galaxy-based general architecture is shown on figure 1. 




Fig. 1. The Galaxy infrastructure, control servers, and user-side levels 



The application server is one of the interface’s key components. It provides the run- 
time environment for the execution of the various processes within the web application. 
Moreover, it maintains relevant data for each user, guarantees security levels and manages 
access control. The interface also uses the application server as a bridge between the 
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interface’s presentation layer (HTML [24]/JavaScript [21], at the browser level) and the 
infrastructure. 

The presentation layer consists of a set of PHP scripts and classes [22] and related 
pages. It is built from information about the location (host and port) of the MultipleWe- 
bClient that provides the bridge with the Galaxy system the user wants to contact; and 
from XML [23] descriptions of the underlying Galaxy system (provided by the hub 
controller). This is a remake of a previous Java/servlets-based interface. 

Besides allowing execution of services on user-specified data, the interface allows 
users to create, store, and load service chains. Service chains are user-side service- or 
program sequences provided by the servers connected to the infrastructure: each service 
is invoked according to the user-specified sequence. Service chains provide a simple 
way for users to test sequences of module interactions without having to actually freeze 
those sequences or build an application. The interface allows not only inspection of 
the end results of a service chain, but also of its intermediate results. Service chains 
may be stored locally, as XML documents, and may be loaded at any time by the user. 
Even though, from the infrastructure’s point of view, service chains simply do not exist, 
selected service chains may be frozen into system-side programs and become available 
for general use. Other developments on the user side are currently being studied to 
address sharing of chain configurations without infrastructure support. 

Modules may be included in the infrastructure in two ways: the first is to create the 
module anew or to adapt it so that it can be incorporated into the system; the second is to 
create a capsule for the existing module - this capsule then behaves as a normal Galaxy 
server would. 

Whenever possible or practical, we chose the second path. Favoring the second option 
proved a wise choice, since almost no changes to existing programs were required. In 
truth, a few changes, mainly regarding input/output methods, had to be made, but these 
are much simpler than rebuilding a module from scratch: these changes were caused by 
the requirement that some of the modules accept/produce XML data in order to simplify 
the task of writing translations. This is not a negative aspect, since the use of XML 
as intermediate data representation language also acts as a normalization measure: it 
actually makes it easier for future users to understand modules’ inputs and outputs. It 
also eases the eventual conversion between module’s outputs and inputs (also needed on 
a command-line approach). 



5 Conclusions and Future Trends 

This paper reported on-going work on the development of a poetry assistant system 
which, besides involving many already existing tools in the lab, also presents some 
interesting research challenges not only in terms of the overall architecture, but namely 
in terms of metric structure. The nest stage of the project will focus on rhythm and 
intonation and on how these may convey differences of meaning. 

The last stage of the project will be devoted to evaluation. We plan to enlarge our test 
corpus in order to conduct some formal evaluation of the two modules (also assessing 
response time). In addition, we plan to do some user evaluation, using a panel of primary 
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and high-school teachers and students, in order to evaluate the performance and the 
usability of the interface. 

We hope that this system would be a valuable e-learning tool which may be used in 
classrooms or at home, to classify poems, help understand the concepts of structure and 
rhyme, and encourage students to write their own poems by suggesting the next words. 
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Abstract. In this paper we investigate the way of improving the perfor- 
mance of a Named Entity Extraction (NEE) system by applying machine 
learning techniques and corpus transformation. The main resources used 
in our experiments are the publicly available tagger TnT and a corpus 
of Spanish texts in which named entities occurrences are tagged with 
BIO tags. We split the NEE task into two subtasks 1) Named Entity 
Recognition (NER) that involves the identification of the group of words 
that make up the name of an entity and 2) Named Entity Classification 
(NEC) that determines the category of a named entity. We have focused 
our work on the improvement of the NER task, generating four differ- 
ent taggers with the same training corpus and combining them using a 
stacking scheme. We improve the baseline of the NER task (Fp = i value 
of 81.84) up to a value of 88.37. When a NEC module is added to the 
NER system the performance of the whole NEE task is also improved. 
A value of 70.47 is achieved from a baseline of 66.07. 



1 Introduction 

Named Entity Extraction involves the identification of words that make up the 
name of an entity, and the classification of this name into a set of categories. For 
example, in the following text, the words “Juan Antonio Samaranch” are the 
name of a person, the word “COI” is an organization name, “Rio de Janeiro” is 
the name of a place and, finally, “Juegos Olimpicos” is an event name: 

El presidente del COI, Juan Antonio Samaranch, se sumo hoy a las 
alabanzas vertidas por otros dirigentes deportivos en Rio de Janeiro 
sobre la capacidad de esta ciudad para acoger unos Juegos Olimpicos. 

In order to implement a system that extracts name entities from plain text we 
have to meet with two different problems, the recognition of a named entity and 
its classification. Named Entity Recognition (NER) is the identification of the 
word sequence that forms the name of an entity, and Named Entity Classification 
(NEC) is the subtask in charge of deciding which is the category assigned to a 
previously recognized entity. 
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There are systems that perform both subtasks at once. Other systems, how- 
ever, make use of two independent subsystems to carry out each subtask sequen- 
tially. The second architecture allows us to choose the most suitable technique to 
each subtask. Named entity recognition is a typical grouping task (or chunking) 
while choosing its category is a classification problem. In practice, it has been 
shown [3] that the division into two separate subtasks is a very good option. 

Our approach to the NEE problem is based on the separate architecture. In 
the development of the NER module we have followed the next steps: 

— To eliminate from the corpus the information relating to the categories, leav- 
ing only the information about the identification of named entity boundaries. 

— To apply three transformations to the recognition corpus. Thus we have 
different views of the same information which enable the tagger to learn in 
different ways. 

— To train TnT with the four corpora available for the NER task (the original 
and the results of the three transformations) . 

— To combine the results of the four taggers in order to obtain a consensual 
opinion. This combination has been carried out applying a stacking scheme, 
where the results of the different models are used to generate a training 
database employed in a second stage of learning. 

The NEC module has been implemented by a classifier induced from a train- 
ing database. The database is obtained calculating a feature vector for each 
entity in the training corpus. The NEC classifier and the classifier used in the 
stacking scheme have been built with algorithms of the weka package [14]. 

Experiments show that the three transformations improve the results of the 
NER task, and that system combination achieves better results than the best of 
the participant models in isolation. This improvement in the NER task repels 
positively in the performance of the NEE task. When a NEC module is applied 
to the previously recognized entities, an improvement of more than four points 
is achieved with respect to the baseline defined for the NEE task. 

The organization of the rest of the paper is as follows. The second section 
presents the resources, measures and baselines used in our experiments. In sec- 
tion three we show how to improve the NER subtask applying corpus transfor- 
mations. In section four we describe how to combine the results of the four NER 
systems with a stacking scheme. Section five presents the NEC module and the 
results of its use in conjunction with the best NER system. Finally, in section 
six we draw the final conclusions and point out some future work. 

2 Resources and Baselines 

In this section we describe the main resources used in our experiments, and the 
baselines we have employed to measure the improvements achieved. 

2.1 The Corpus and the Tagger 

This corpus provides a wide set of named entity examples in Spanish. It was 
used in the NER task of CoNLL-02 [12]. The files are: 
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— Training corpus with 264715 tokens and 18794 entities 

— Test corpus with 52923 tokens and 4315 entities 

BIO notation is used to denote the limits of a named entity. The initial word 
of a named entity is tagged with a B tag, and the rest of words of a named 
entity are tagged with I tags. Words outside an entity are denoted with an 
O tag. There are four categories in the corpus taxonomy: PER (people), LOC 
(places), ORG (organizations) and MISC (rest of entities), so the complete set 
of tags is {B-LOC, I-LOC, B-PER, I-PER, B-ORG, I-ORG, B-MISC, I-MISC, 
O}. 

The NER task does not need the category information, so we have simplified 
the tag set removing the category information from the tags. Figure 1 shows a 
fragment of the original corpus, and its simplified version used in the NER task. 
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Fig. 1. Original Corpus and Corpus Tagged Only for the Recognition Subtask 



We have clroosen the tagger TnT as basis for developing the NER systems 
presented in this paper. TnT [1] is one of the most widely used re-trainable 
tagger in NLP tasks. It is based upon second order Markov Models, consisting 
of word emission probabilities and tag transition probabilities computed from 
trigrams of tags. 

2.2 Measures and Baselines 

The measures used in our experiments are, precision, recall and the overall mea- 
sure Fg~ -\ . These measures were originally used for Information Retrieval eval- 
uation purposes, but they have been adapted to many NLP tasks. 

Precision is computed according to the number of correctly recognized enti- 
ties, and recall is defined as the proportion of the actual entities that the system 
has been able to recognize: 

correctly extracted entities 

Precision = 



extracted entities 
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„ ,, correctly extracted entities 

Recall = 

actual entities 

Finally, Fp—x combines recall and precision in a single measure, giving to 
both the same relevance: 



2 Precision Recall 
Precision + Recall 



We will trust in Fp—\ measure for analyzing the results of our experiments. It 
is a good performance indicator of a system and it is usually used as comparison 
criterion. Table 1 shows the results obtained when TnT is trained with the 
original corpus ( NEEJbaseline ) and with its simplified version used in the NER 
subtask (NER-baseline ) , we will adopt these results as the baselines for further 
experiments in this paper. 



Table 1. Baselines. NEE and NER Results with TnT 





Precision 


Recall 


F f 3=1 


NEE_baseline 


66.28% 


65.85% 


66.07 


NER_baseline 


81.40% 


82.28% 


81.84 



The NER baseline is much higher than the NEE baseline because the NER 
problem is simpler than the whole NEE task. In this paper we will take the 
approach of improving the NER subtask to build an NEE system (adding a 
NEC module) that improves the NEE baseline. 

3 Improving NER Task Through Corpus Transformation 

It seems logical to think that if we have more information before taking a deci- 
sion we have more possibilities of choosing the best option. For this reason we 
have increased the number of models as a way of improving the performance of 
the NER task. There are two obvious ways of building new models: using new 
training corpora or training other taggers with the same corpus. We have tried a 
different approach, defining three transformations that give us three additional 
versions of the training corpus. Transformations can be defined to simplify the 
original corpus or to add new information to it. If we simplify the corpus, we 
reduce the number of possible examples and the sparse data problem will be 
smoothed. On the other hand if we enrich the corpus, the model can use new 
information to identify new examples not recognized by the original model. 

3.1 Vocabulary Reduction 

This transformation discards most of the information given by words in the 
corpus, emphasizing the most useful features for the recognition. We employ a 
technique similar to that used in [10] replacing the words in the corpus with 
tokens that contain relevant information for recognition. The goals pursued are: 
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— To stand out certain typographic features of words, like capitalization, that 
can help in deciding if a word is part of a named entity or not. 

— To give more relevance to words that can help in the identification of a 
named entity. 

— To group several words of the vocabulary into a single entry. 

Apart from typographic information there are other features that can be 
useful in the identification of entities, for example non-capitalized words that 
frequently appear before, after or inside named entities. We call them trigger 
words and they are of great help in the identification of entity boundaries. 

Both pieces of information, trigger words and typographic clues, are extracted 
from the original corpus through the application of the following rules: 

— Each word is replaced by a representative token, for example, starts -cap _ 
for words that start with capital letters, _ lower _ for words that are written in 
lower case letter, _alLcap- if the whole word is upper case, etc. These word 
patterns are identified using a small set of regular expressions. 

— Not all words are replaced with its corresponding token, trigger words remain 
as they appear in the original corpus. The list of trigger words is computed 
automatically counting the words that most frequently appear around or 
inside an entity. 

Figure 2 shows the result of applying these rules to the corpus fragment of 
Figure 1. Vocabulary reduction leads to an improvement in the performance of 
the NER subtask. The results of the experiment TnT-V are presented in Table 
2, we can see that TnT improves from 81.84 to 83.63. 

3.2 Change of Tag Set 

This transformation does not affect to words but to tags. The basic idea is 
to replace the original BIO notation with a more expressive one that includes 
information about the words that usually end a named entity. The new tag 
set has five tags, the three original (although two of them change slightly their 
semantic) plus two new tags: 

— B, that denotes the beginning of a named entity with more than one word. 

— BE, that is assigned to a single-word named entity. 

— I, that is assigned to words that are inside of a multiple-word name entity, 
except to the last word. 

— E, assigned to the last word of a multiple-word named entity. 

— O, that preserves its original meaning: words outside a named entity. 

The new tag set gives more relevance to the position of a word, forcing the 
tagger to learn which words appear more frequently at the beginning, at the end 
or inside a named entity. Figure 2 shows the result of applying this new tag set to 
the corpus fragment of Figure 1. Changing the tag set also leads to better results 
in the NER task than those obtained with the original corpus. The results of 
the experiment TnT-N are showed in Table 2. In this case, TnT improves from 
81.84 to 84.59, the best result of the three transformations studied. 
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3.3 Addition of Part-of-Speech Information 

Unlike previous corpus transformations, in this case we will make use of external 
knowledge to add new information to the original corpus. Each word will be 
replaced with a compound tag that integrates two pieces of information: 

— The result of Applying the First Transformation (Vocabulary Reduction). 

— The part of speech (POS) tag of the word. 

In order to obtain the POS tag of a word we have trained TnT with the 
tagged corpus CLiC-TALP [4]. This corpus is a one hundred thousand word 
collection of samples of written language, it includes extracts from newspapers, 
journals, academic books and novels. Figure 2 shows the result of the application 
of this transformation to the corpus fragment of Figure 1. Adding part of speech 
information also implies an improvement in the performance of TnT in the NER 
task. Table 2 presents the results of the experiment TnT-P, in this case TnT 
reaches an Fp— i measure of 83.12. 




reduction. tag set. information. 

Fig. 2. Result of Applying Transformations to the Corpus Fragment Showed in Figure 1 



4 Improving the NER Task Through System 
Combination 

The three transformations studied cause an improvement in the performance of 
the NER task. But we still have room for improvement if instead of applying 
the transformations separately we make them work together. A superficial anal- 
ysis of the texts tagged by each model shows that not all the models make the 
same mistakes. There are some “very difficult” examples that are not recognized 
by any model, but many of the mistakes can be corrected taking into account 
the tag proposed by other models. System combination is not a new approach in 
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NLP tasks, it has been used in several problems like part of speech tagging [6] , 
word sense disambiguation [8], parsing [7] and noun phrase identification [11]. 



4.1 Stacking 

Stacking consists in applying machine learning techniques for combining the 
results of different models. The main idea is to build a combining system that 
learns the way in which each model is right or makes a mistake. In this way, the 
final decision is taken according to a pattern of correct and wrong answers. 

We need a training database to be able of learning the way in which every 
model is right or wrong. Each register in the training database includes all the 
tags proposed by the participant models for a given word and the actual tag. In 
order to enrich the training database we have included in the registers the tags 
of the two previous and the two following words, this way we take into account 
not only the information associated to a word but also information about its 
context. We make the database independent of the training and test corpus 
using an additional corpus of 51533 words to generate the registers. Figure 3 
presents an extract of the generated database. For each model there are five tags 
that correspond, respectively, to the two previous words, the word in question 
and the two following words. The last tag of each register is the actual tag of 
the word represented by the register. 
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Fig. 3. Extract of the Generated Database 



Apart from allowing the use of heterogeneous information, machine learning 
has another important advantage over voting: it is possible to choose among a 
great variety of schemes and techniques to find the most suitable one to each 
problem. Bagging [2] is one of these schemes, it provides a good way of handling 
the possible bias of the model towards some of the examples of the training 
database. Bagging is based on the generation of several training data sets taking 
as base a unique data set. Each new version is obtained by sampling with re- 
placement the original database. Each new data set can be used to train a model 
and the answers of all the models can be combined to obtain a joint answer. The 
joint answer is usually obtained by voting. In the experiment TnT-Stack we have 
applied a bagging scheme (using decision trees [9] as base classifier) to combine 
the results given by TnT-V, TnT-N and TnT-P. Table 2 shows the results of 
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Table 2. Results of NER Experiments 





Precision 


Recall 


F/3=l 


NER_baseline 


81.40% 


82.28% 


81.84 


TnT-V 


81.76% 


85.59% 


83.63 


TnT-N 


85.04% 


84.15% 


84.59 


TnT-P 


81.51% 


84.79% 


83.12 


TnT-Stack 


88.68% 


88.05% 


88.37 



this experiment. With this system we obtain the best result (88.37), with an 
improvement of 6.53 points with respect to the baseline for the NER subtask. 
There are other authors that also propose stacking as a way of improve the 
performance of NEE systems, for example [5] and [15]. In both cases they use 
several taggers to obtain the different opinions combined through the stacking 
scheme. In this sense, the main contribution of our work is the use of corpus 
transformation to obtain the different models needed to apply stacking. This 
way we have the variability necessary to apply system combination without 
using several training corpora or several taggers. 



5 The NEC Module 

After the NER stage, we have a text in which possible entities have been iden- 
tified without specifying the class they belong. At this point, the named entity 
extractor is completed with a NEC module implemented by a classifier induced 
from a training database. The database is obtained by calculating a feature vec- 
tor for each entity in the training corpus. The features used to generate the 
vectors are the following: 

1. Orthographic: we check if an entity contains words that begin with capital 
letters, or if they contain digits, Roman numbers, quotes, etc. 

2. Length and Position: we use as feature the length in words of an entity, as 
well as the relative position within the phrase. 

3. Suffixes: frequent suffixes for each category are calculated from the training 
corpus. 

4. Contexts: relevant words for each category are calculated in a window of 
three words around the entities found in the training corpus. 

5. Content Words: for each category, the set of significant words is calculated 
eliminating stop words and those in small letters from the examples of the 
training corpus. 

We have used a Support Vector Classifier [13] to implement this classification 
task. With this method a binary classifier is defined through an lryperplane that 
optimally divides the classes. Multi-class classifiers can be built combining binary 
classifiers with a pairwise ensemble scheme. It has been shown that the optimal 
lryperplane is determined by only a small fraction of the data points, the so-called 
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Table 3. Results of NEE Experiments 





Precision 


Recall 


Fp—i 


NEE_baseline 


66.28% 


65.85% 


66.07 


NECLalone 


78.22% 


78.22% 


78.22 


TnT-Stack-NEC 


70.72% 


70.22% 


70.47 



support vectors. The support vector classifier training algorithm is a procedure 
to find these vectors. 

Table 3 shows the standalone performance of the NEC module (assuming no 
NER. errors) and the results of its application to the best NER system studied 
in this paper (experiment TnT-Stack-NEC). An improvement of more than four 
points with respect to the baseline defined for the NEE is obtained. 

6 Conclusions and Future Work 

In this paper we have shown that the performance of a named entity extraction 
system can be improved by applying a stacking scheme. We have split the NEE 
task into two subtasks: recognition and classification. Our work has been focused 
on demonstrating that the performance of the NER task can be improved by 
combining different NER systems that have been obtained with only one tagger 
(TnT) and one training corpus. The variability necessary to be able to apply 
system combination has been achieved applying transformations to the training 
corpus. The baseline for the NER task is improved from 81.84 to 88.37. This 
performance is similar to state of the art recognizers, with comparable results 
to those obtained by one of the best NER systems for Spanish texts [3] . 

As additional conclusion we have demonstrated that the improvement of 
the NER task also entails an improvement in the complete extraction task. The 
experiments demonstrate that the baseline defined for the NEE task is surpassed 
with clarity, improving from 66.07 to 70.47. 

Much future work remains. Recognition results are very good but the classi- 
fication ones are still poor, we have implemented a very simple NEC module and 
there are still room for improvement in this aspect including new features and 
experimenting with new learning methods. We are also interested in applying the 
ideas of this paper to the extraction of entities in specific domains. In this kind 
of tasks the knowledge about the domain could be incorporated to the system 
via new transformations. We also plan to take advantage of system combination 
to help in the construction of annotated corpus, using the jointly assigned tag 
as agreement criterion in co-training or active learning schemes. 
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Abstract. The task of automatic text summarization consists of generating a 
summary of the original text that allows the user to obtain the main pieces of 
information available in that text, but with a much shorter reading time. This is 
an increasingly important task in the current era of information overload, given 
the huge amount of text available in documents. In this paper the automatic text 
summarization is cast as a classification (supervised learning) problem, so that 
machine learning-oriented classification methods are used to produce 
summaries for documents based on a set of attributes describing those 
documents. The goal of the paper is to investigate the effectiveness of Genetic 
Algorithm (GA)-based attribute selection in improving the performance of 
classification algorithms solving the automatic text summarization task. 
Computational results are reported for experiments with a document base 
formed by news extracted from The Wall Street Journal of the TIPSTER 
collection-a collection that is often used as a benchmark in the text 
summarization literature. 



1 Introduction 

We are surely living in an era of information overload. Recent studies published by 
the University of Berkeley [8] indicate that in 2002 about 5 million terabytes of 
information were produced (in films, printed media or magnetic/optic storage media). 
This number is equivalent to twice as much the corresponding number for 1999, 
which indicates a growth rate of about 30% per annum. The Web alone contains about 
170 terabytes, which is roughly 17 times the size of the printed material in the USA’s 
Congress Library. 

On the other hand, it is very difficult to use the available information. Many 
problems - such as the search for information sources, the retrieval/extraction of 
information and the automatic summarization of texts - became important research 
topics in Computer Science, The use of automatic tools for the treatment of 
information became essential to the user, because without those tools it is virtually 
impossible to exploit all the relevant information available in the Web [22], 
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In this scenario, the task of automatic text summarization is very important. The 
goal of an automatic text summarization system is to generate a summary of the 
original text that allows the user to obtain the main pieces of information available in 
that text, but with a much shorter reading time [12]. The summaries are produced 
based on attributes (or features) that are usually derived empirically, by using 
statistical and/or computational linguistics methods. The values of these attributes are 
derived from the original text, and the summaries typically have 10%-30% of the size 
of the original text [11], 

One of the approaches that has been recently used to perform automatic text 
summarization is the use of Machine Learning methods [13]. In this context automatic 
text summarization is cast as a classification (supervised learning) task [5], [6], as will 
be discussed in Section 3. Other approaches for text summarization (which do not 
involve machine learning) are described in [11], [12], 

In addition, an important data preprocessing task for effective classification is the 
attribute selection task, which consists of selecting the most relevant attributes for 
classification purposes [7], This task is important because many original attributes can 
be irrelevant for classification, in which case their removal tends to improve the 
performance of the classification algorithm. Furthermore, attribute selection reduces 
the processing time taken by the classification algorithm, and it can also lead to the 
discovery of smaller, simpler classification models (e.g. smaller decision trees, as 
observed in this paper). 

The goal of the paper is to investigate the effectiveness of Genetic Algorithm 
(GA)-based attribute selection in improving the performance of classification 
algorithms solving the automatic text summarization task. GAs have been chosen as 
the attribute selection methods because they have been very successful in this data 
preprocessing task [3], [2]. This is mainly due to their ability to cope well with 
attribute interaction (which is the crucial problem in attribute selection) [2]. More 
precisely, this paper investigates the effectiveness of two GAs for attribute selection 
in improving the performance of two different kinds of classification algorithms - viz. 
a decision tree-induction algorithm and the Naive Bayes classifier. 

The remainder of this paper is organized as follows. Section 2 discusses GA-based 
attribute selection. Section 3 describes the ClassSumm system for summarization cast 
as a classification problem. Section 4 reports computational results. Finally, Section 5 
presents the conclusions and discusses future work. 



2 Attribute Selection with a Multi-objective Genetic Algorithm 

Attribute selection is one of the most important tasks that precedes the application of 
data mining algorithms to real world databases [7]. It consists of selecting a subset of 
attributes relevant to the target data mining task, out of all original attributes. In this 
paper, the target task is classification, and one attribute is considered relevant if it is 
useful for discriminating examples belonging to different classes. 

Attribute selection algorithms they differ from each other in two main components: 
the kind of search method they use to generate candidate attribute subsets and the way 
they evaluate the quality of a candidate attribute subset. The search methods can be 
classified in three main classes: exponential (e.g exhaustive search), randomised (e.g. 
genetic algorithms) and sequential (e. g. forward and backward sequential selection 
[7]) methods. In this paper we are interested in genetic algorithms (GA), since they 
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are a robust search method, capable of effectively exploring large search spaces - 
which is usually the case in attribute selection. They also have the advantage of 
performing a global search - unlike many greedy, local search algorithms. In the 
context of data mining, this global search means that GAs tend to cope better with 
attribute interaction than greedy search methods [2], 

The evaluation of the quality of each candidate solution can be based on two 
approaches: the filter and the wrapper approach. In essence, in the wrapper approach 
the attribute selection method uses the classification algorithm (as a black box) to 
evaluate the quality of a candidate attribute subset. In the filter approach the attribute 
selection method does not use the classification algorithm. We use the wrapper 
approach, because it tends to maximize predictive accuracy. (Note that this approach 
has the disadvantage of being significantly slower than the filter approach.) 

The attribute selection task usually involves the optimisation of more than one 
objective, e.g. the predictive accuracy and the comprehensibility of the discovered 
knowledge. This is a challenging problem, because the objectives to be optimised can 
be conflicting with one another and they normally are non-commensurable - i.e., they 
measure different aspects of the target problem. 

In the multi-objective optimisation framework [1], when many objectives are 
optimised there is no single best solution. Rather, there is a set of optimal solutions, 
each one involving a certain trade-off among the objectives. Multi-objective 
optimisation is based on Pareto dominance, that is: a solution dominates another 
solution S, iff S is not worse than S 2 w.r.t. any objective and S ; is strictly better than 
S, w.r.t. at least one objective. In multi-objective optimisation the system searches for 
non-dominated solutions. 

MOGA is a Multi-Objective Genetic Algorithm designed to select attribute subsets 
for classification. It follows the basic ideas of GAs, i.e., it evolves a population of 
individuals, where each individual is a candidate solution to a given problem. In 
MOGA, each individual consists of M genes, where M is the number of original 
attributes in the data being mined. Each gene can assume values 0 or 1 , indicating the 
absence or presence of the corresponding attribute in the selected subset of attributes. 

Each individual is evaluated by a fitness function, which measures the quality of its 
attribute subset. At each generation (iteration) the fittest (the best) individuals of the 
current population survive and produce offspring resembling them, so that the 
population gradually contains fitter and fitter individuals - i.e., better and better 
candidate solutions to the underlying problem. The fitness function of MOGA is 
based on the wrapper approach, and involves the minimisation of both the 
classification error rate and the size of the decision tree built by J4.8[21], MOGA 
searches for non-dominated solutions w.r.t. these two objectives. The version used in 
this paper returns, as the selected attribute subset, the non-dominated solution which 
dominates the largest number of solutions in the last generation. For more details 
about MOGA the reader is referred to [15], [16], 



3 The ClassSumm System for Text Summarization 



The ClassSumm (Classification-based Summarizer) system, proposed by [5], [6], is a 
system for automatic text summarization based on the idea of casting that task as a 
classification task and then using corresponding Machine Learning methods. 
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The system consists of the following main steps: 

(1) the system extracts the individual sentences of the original documents, using 
one the approaches analysed in [18], in this work it was used the regular expression 
approach; 

(2) each sentence is associated with a vector of predictor attributes (features), 
whose values are derived from the content of the sentence; 

(3) each sentence is also associated with one of the following two classes: 
Summary (i.e., the sentence belongs to the summary) or Not-Summary (i.e., the 
sentence does not belong to the summary). 

This procedure allows us to cast text summarization as a classification, supervised 
learning problem. As usual in the classification task, the goal of the classification 
algorithm is to discover, from the data, a relationship (say, an IF-THEN classification 
rule) that predicts the correct value of the class for each sentence based on the values 
of the predictor attribute for that sentence. More precisely, this casting leads to the 
following steps for solving a text summarization problem: 

(1) The system constructs a training set where each example (record) corresponds 
to a sentence of the original documents, and each example is represented by a set of 
attribute values and a known class. 

(2) A classification algorithm is trained to predict each sentence’s class (Summary 
or Not-Summary) based on its attribute values. 

(3) Given a new set of documents, the system produces a test set with predictor 
attributes in the same format as the training set. However, the values of the classes are 
unknown in the test set. 

(4) Each sentence in the test set is classified, by the trained algorithm produced in 
step (2), in one of the two classes: Summary or Not-Summary. 

Note that this procedure does not take into account the size of the summary to be 
generated. In practice the user often wants a summary of a specified size - in terms of 
percentage of the original document size. In order to take this into account, one uses a 
classification algorithm that, instead of directly predicting the class of each sentence, 
assigns to each sentence a measure of the relevance of that sentence for the summary. 
This produces a ranking of the sentences. Then the top N sentences in that ranking are 
assigned the class Summary and all the other sentences are assigned the class Not- 
Summary, where A is a user- specified parameter. 

The classification algorithms used in the current version of ClassSumm are Naive 
Bayes [13] and C4.5 [17]. In the former the relevance of a sentence for the summary 
is directly obtained from the conditional probability of the class Summary given the 
attribute values in the sentence. In the case of C4.5 the relevance of a sentence for the 
summary is obtained from the confidence factor associated with each leaf node of the 
induced tree. 

The attributes used by ClassSumm can be categorized into two broad groups: 
shallow and deep attributes. Shallow attributes are based on heuristics and statistical 
methods; whereas deep attributes are based on linguistic knowledge. Both kinds of 
attributes are used in this paper. This work focuses on English texts only. 

As usual in text processing systems, a preliminary preprocessing phase is 
performed [19]. This phase consists of four steps: 
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(1) identifying the sentences of the document; (2) converting all characters to lower 
case ( case folding ); (3) removing very common words ( stop words ) which do not 
contribute to the meaning of the text - e.g., “the”, “a”, etc.; (4) removing suffixes 
(i.e., performing stemming ), so that words such as “learned” and “learning” are 
converted to the standard form “learn”. These preprocessing steps help to 
significantly reduce the number of words, which is very important to improve the 
cost-effectiveness of automatic text summarization. 

After this preprocesing, each sentence of the document is represented by an 
attribute vector consisting of the following elements: 

1. Position: indicates the position of the sentence in the text, in terms of 
percentile, as proposed by Nevill-Manning [14]; 

2. Size: indicates the number of terms (words) in the sentence; 

3. Average-TF-ISF: the TF-ISF (term frequency - inverse sentence frequency) 
measure [4] is a variation of the TF-IDF measure [20] widely used in 
information retrieval. (The difference between the two measures is explained 
in detail in [4].) The value of TF-ISF for each term of a sentence is 
computed, and the value of the Average-TS-ISF attribute for that sentence is 
the average value over the TF-ISF values for all the terms in that sentence; 

4. Similarity to Title: The computation of this measure is based on the vectorial 
representation of the document, where each sentence is represented by a 
vector formed by its terms [20]. Initially the title of the document is 
preprocessed, forming a vector of terms, and then the similarity between each 
sentence and the title of the document is calculated by the co-sine measure 
[ 20 ]; 

5. Similarity to Keywords: Analogously to the previous attribute, this attribute is 
computed by using the vectorial representation of the document and 
calculating the similarity between each sentence and the vector of keywords 
by using the co-sine measure. This assumes the document has a set of author- 
provided keywords, which is the case in this work; 

6. Cohesion w.r.t. All Other Sentences: This attribute is computed by calculating 
the distance between a sentence and every other sentence in the document. 
The sum of all those distances is the value of this attribute for the sentence in 
question; 

7. Cohesion w.r.t. the Centroid: First the system computes the centroid of the 
document, which is simply a vector consisting of the arithmetic means of all 
sentence vectors’ elements. Then the value of this attribute for a given 
sentence is computed by calculating the similarity between the sentence and 
the centroid vector - again, using the co-sine measure. 

The next two attributes use a kind of linguistic structure built as an approximation 
to the text’s rhetorical tree. This structure is obtained by running a hierarchical 
clustering algorithm, which forms clusters of similar sentences based on the vectorial 
representation of the sentences. The output of the clustering algorithm is a clustering 
tree where the leaf nodes are sentences and internal nodes represent clusters that have 
more and more sentences as the root of the tree is approached. The root of the tree 
represents a single cluster with all sentences in the document. The similarity measure 
used by the clustering algorithm is, again, the co-sine measure. Once a clustering tree 
has been produced by the hierarchical clustering algorithm, that tree is used to 
compute the following attributes for each sentence: 
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8. The depth of the sentence in the tree, i.e, the number of nodes that are 
ancestors of the leaf node representing that sentence. 

9. The direction of the sentence in the tree, computed by following the path from 
the root towards the sentence up to depth four. At each depth level the 
direction can be Left , Right ou None (in case the current level is greater than 
the level of the sentence). This produces four attributes, each with one 
direction value. These attributes indicate the approximate position of the 
sentence in the rhetorical tree, incorporating linguistic knowledge into the set 
of predictor attributes. 

The following attributes are obtained from the original text before the application 
of the preprocessing phase, and they also incorporate linguistic knowledge into the set 
of predictor attributes. 

10. Indicators of Main Concepts: these indicators are computed by using a 
morphological part-of-speech tagger that identifies nouns in the document. 
The motivation for focusing on nouns is that they tend to be more meaningful 
(at least as individual words) than other part-of-speech classes. The 15 most 
frequent nouns in the document are selected to be the indicators of main 
concepts. For each sentence, the value of this attribute is true if the sentence 
contains at least one of those 15 indicators, and false otherwise. 

11. Presence of Anaphors: From a linguistic point of view, the presence of 
anaphors in a sentence usually indicates that the information in the sentence 
is not essential, being used only to complement the information in a more 
relevant sentence. In ClassSumm the anaphors are identified by using a fixed 
list of words indicating anaphors. For each sentence, the value of this 
attribute is true if at least one of the first six words of the sentence is one of 
the words in the anaphor list, and false otherwise. 

12. Presence of Proper Nouns: This attribute is computed directly from the 
output of a part-of-speech tagger. The value of the attribute is true if the 
sentence contains at least one proper noun, and false otherwise. 

13. Presence of Discourse Markers: Some discourse markers, such as because, 
furthermore, also tend to indicate the presence of non-essential information. 
Discourse markers are identified by using a fixed list of words. The value of 
this attribute is true if the sentence contains at least one word in the list of 
discourse markers, and false otherwise. 

Before the classification algorithm is applied to the training set, all the above non- 
binary attributes are normalized to the range [0..1] and then discretized. We adopt a 
simple “class-blind" discretization method, which consists of separating the original 
values into equal-width intervals; this procedure has produced good results in our 
previous experiments [5] . 



4 Computational Results 

Previous work has reported results comparing ClassSumm with other Summarization 
methods [5], [6]. In those previous projects all original attributes were used. This 
paper focuses on a different issue. It investigates whether the performance of 
ClassSumm can be improved by using sophisticated attribute selection methods in a 
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preprocessing step. An attribute selection method outputs only a subset of relevant 
attributes to be given to the classification algorithm, which hopefully will increase the 
predictive accuracy of the classification algorithm - which is also the accuracy of the 
decisions about which sentences should be included in the summary. The attribute 
selection methods used here are two kinds of Genetic Algorithms, namely a single- 
objective and a Multi-Objective Genetic Algorithm (MOGA) described in Section 2. 

Experiments were carried out with a document base formed by news extracted 
from The Wall Street Journal of the TIPSTER collection [10]. This collection is often 
used as a benchmark in the text summarization literature. 

For each document, a summary was produced using one of the following two 
approaches: (1) An automatically-generated summary, formed by the document’s 
sentences that are most similar (according to the co-sine measure) to the summary 
provided by the author of the text, following the procedure proposed by Mani and 
Bloedorn [9]. This kind of summary is called an “ideal automatic summary”. (2) A 
manually-generated summary, produced by an English teacher by selecting the most 
relevant sentences of the text. This is called an “ideal manual summary”. 

In all the experiments the training set consisted of 100 documents with their 
respective ideal automatic summaries. Experiments were carried out with two 
different kinds of test set. More precisely, in one experiment the test set consisted of 
100 documents with their respective ideal automatic summaries, and in another 
experiment the test set consisted of 30 documents with their ideal manual summaries. 
In all experiments the training set and the test set were, of course, disjoint sets of 
documents, since the goal is to measure the predictive accuracy (generalisation 
ability) in the test set, containing only examples unseen during training. 

In order to evaluate how effective Genetic Algorithm (GA)-based attribute 
selection is in improving the predictive accuracy of ClassSumm, two kinds of GAs for 
attribute selection have been used - both of them following the wrapper approach. 
The first one was the Multi-Objective GA (MOGA) discussed in Section 2. MOGA 
was used to select attributes for J4.8, a well-known decision-tree induction algorithm 
[21]. Recall that MOGA performs a multi-objective optimisation (in the Pareto sense) 
of both J4.8’s error rate and the decision tree size. The results of training J4.8 with the 
attributes selected by the MOGA were compared with the results of training J4.8 with 
all original attributes, as a control experiment. 

The second kind of GA used in the experiments was a simpler GA, called Single- 
Objetive GA (SOGA). It optimises only the error rate of a classification algorithm. 
SOGA was implemented directly from the MOGA implementation, by simply 
modifying MOGA’s fitness function and selection method to optimise a single 
objective. Due to the focus on a single objective, the classifier used in this 
experiments was Naive Bayes, whose measure of performance involves only error 
rate (no measure of size of the induced model). Again, the results of training Naive 
Bayes with the attributes selected by the SOGA were compared with the results of 
training Naive Bayes with all original attributes, as a control experiment. 

The results are reported in Tables 1 and 2, which refer to the results for the test sets 
containing ideal automatic summaries and ideal manual summaries (as explained 
earlier), respectively. Each of these tables reports results for two kinds of summary 
size (10% and 20% of the original document). Finally, for each kind of test set and 
each summary size, the results of four methods are compared - two methods using the 
classification algorithms (J4.8 and Naive Bayes) with all attributes and two methods 
using those algorithms with GA-based attribute selection, as explained above. In the 
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experiments with J4.8 the reported results include the accuracy in the test set (in the 
range [0..1]), the decision tree size (number of tree nodes), and the number of selected 
attributes (or “all” - i.e., 16 attributes - when no attribute selection was done). In the 
experiments with Naive Bayes, of course only the accuracy and number of selected 
attributes are reported. 



Table 1. Results for test set containing “ideal” automatic summaries 



Summary Size = 10% of original document 


Method 


Accuracy 


Tree size 


# selected attrib. 


J4.8 


0.18 


42 


All 


MOGA-J4.8 


0.33 


7 


4 


Naive Bayes 


0.39 


N/a 


All 


SOGA-Naive Bayes 


0.38 


N/a 


9 


Summary Size = 20% of original document 


Method 


Accuracy 


Tree Size 


# selected attrib. 


J4.8 


0.44 


164 


All 


MOGA-J4.8 


0.47 


4 


2 


Naive Bayes 


0.51 


N/a 


All 


SOGA-Naive Bayes 


0.52 


N/a 


11 



Table 2. Results for test set containing “ideal” manual summaries 



Summary Size = 10% of original document 


Method 


Accuracy 


Tree Size 


# selected attrib. 


J4.8 


0.15 


42 


All 


MOGA-J4.8 


0.25 


7 


3 


Naive Bayes 


0.23 


N/a 


All 


SOGA-Naive Bayes 


0.22 


N/a 


12 


Summary Size = 20% of original document 


Method 


Accuracy 


Tree Size 


# selected attrib. 


J4.8 


0.33 


164 


All 


MOGA-J4.8 


0.35 


4 


1 


Naive Bayes 


0.36 


N/a 


All 


SOGA-Naive Bayes 


0.35 


N/a 


11 



Several trends in the results can be observed in Tables 1 and 2. First, as expected, 
in both Table 1 and Table 2 the accuracy associated with the larger summaries (20% 
of original document) is considerably larger than the accuracy associated with the 
smaller summaries (10% of the original document). This reflects the fact that, as the 
size of the summary increases, the classification problem becomes easier - e.g., the 
class distribution becomes less unbalanced (i.e, closer to a 50-50% class distribution). 

Second, the GA-based attribute selection procedure had different effects on the 
performance of ClassSumm, depending on the kind of GA and classifier used in the 
experiments. The use of MOGA-based attribute selection led to an increase in J4.8’s 
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accuracy. This increase was very substantial in the smaller (10%) summaries, but 
relatively small in the larger (20%) summaries. In addition, MOGA-based attribute 
selection was very effective in selecting a very small number of attributes, which led 
to a very significant reduction in the size of the induced decision tree. This 
significantly improves the comprehensibility of discovered knowledge, an important 
goal in data mining [2], [21]. 

On the other hand, the effect of SOGA-based attribute selection in Naive Bayes’ 
accuracy was not so good. The effect was very small and, overall, even slightly 
negative. 

These results for the two kinds of GA-based attribute selection are qualitatively 
similar in both Table 1 and Table 2, so that they are independent from whether the test 
set contains automatic summaries or manual summaries. 



5 Conclusions and Future Work 

As mentioned earlier, the goal of this paper was to investigate the effectiveness of 
Genetic Algorithm (GA)-based attribute selection in improving the performance of 
classification algorithms solving the automatic text summarization task. Overall, the 
two main conclusions of this investigation were as follows. 

First, the Multi-Objective GA (MOGA) was quite effective. It led to an increase in 
the accuracy rate of the decision tree-induction algorithm used as a classifier, with a 
corresponding increase in the accuracy of the text summarization system. It also led to 
a very significant reduction in the size of the induced decision tree. Hence, the multi- 
objective component of the GA, which aims at optimising both accuracy and tree size, 
is working well. 

Second, the Single-Objective GA (SOGA), which aimed at optimising 
classification accuracy only, was not effective. Surprisingly, there was no significant 
difference in the results of Naive Bayes with all attributes and the results of Naive 
Bayes with only the attributes selected by this GA. This indicates that all the original 
attributes seem more or less equally relevant for the Naive Bayes classifier. 

It should be noted that there is a lot of room for improvement in the results of the 
system, since the largest accuracy rate reported in Tables 1 and 2 was only 52%. 
Despite all the effort put into the design and computation of the 16 predictor attributes 
(which involve not only heuristic and statistical indicators, but also several relatively 
sophisticated linguistic concepts - e.g. a rhetorical tree), the current attribute set still 
seems to have a relatively limited predictive power. This suggests that future work 
could focus on designing an extended set of predictor attributes with more predictive 
power than the current one. Considering the difficult of doing this in a manual 
fashion, one interesting possibility is to use attribute construction methods to 
automatically create a better set of predictor attributes. 



References 

1. Deb, K.: Multi- Objective Optimization using Evolutionary Algorithms. John Wiley & 
Sons (2001) 

2. Freitas, A. A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. 
Springer (2002) 




314 



C.N. Silla Jr. et al. 



3. Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern 
classifiers. Patten Recognition 33 (2000) 25-41 

4. Larocca Neto, J., Santos, A.D., Kaestner, C.A.A., Freitas, A.A.: Document clustering and 
text summarization. In: Proc. 4 lh Int. Conf. Pratical Applications of Knowledge Discovery 
and Data Mining. (2000) 41-55 

5. Larroca Neto, J., Freitas, A.A., Kaestner, C.A.A.: Automatic text summarization using a 
machine learning approach. In: XVI Brazilian Symposium on Artificial Intelligence. 
Number 2057 in Lecture Notes in Artificial Intelligence, Springer (2002) 205-215 

6. Larroca Neto, J.: A Contribution to the Study of Automatic Text Summarization 
Techniques (in Portuguese). Master's thesis, Pontifica Universidade Catolica do Parana 
(PUC-PR), Graduate Program in Applied Computer Science. (2002) 

7. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. 
Kluwer Academic Publishers (1998) 

8. Lyman, P., Varian, H.R.: How much information. (Retrieved from 

http://www.sims.berkeley.edu/ how-much-info-2003 on [01/19/2004] 

9. Mani, I„ Bloedorn, E.: Machine learning of generic and user-focused summarization. In: 
Proc. of the 15 th National Conf. on Artificial Intelligence (AAI 98). (1998) 821-826 

10. Mani, I., House, D., Klein, G., Hirschman, L., Obrsl, L., Firmin, T., Chrzanowski, M., 
Sundeheim, B.: The tipster summac text summarization evaluation. MITRE Technical 
Report MTR 92W0000138, The MITRE Corporation (1998) 

11. Mani. I., Maybury, M.T.: Advances in Automatic Text Summarization. MIT Press (1999) 

12. Mani. I.: Automatic Summarization. John Benjamins Publishing Company (2001) 

13. Mitchell. T.M.: Machine Learning. McGraw-Hill (1997) 

14. Nevill-Manning, C.G., Witten, I.H.. Paynter, G.W., Frank, E., Gutwin, C.: KEA: Pratical 
Automatic Keyphrase Extraction. ACM DL 1999 (1999) 245-255 

15. Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: Attribute selection with a multiobjective 
genetic algorithm. In: XVI Brazilian Symposium on Artificial Intelligence. Number 2057 
in Lecture Notes in Artificial Intelligence, Springer (2002) 280-290 

16. Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: A multi-objective genetic algorithm for 
attribute selecion. In: Proc. 4 th Int. Conf. on Recent Advances in Soft Computing (RASC- 
2002), University of Nottingham, UK (2002) 1 16-121 

17. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993) 

18. Silla Jr., C. N.. Kaestner, C.A.A.: An Analysis of Sentence Boundary Detection Systems 
for English and Portuguese Documents. In: 5 th International Conf. on Intelligent Text 
Processing and Computational Linguistics. Number 2945 in Lecture Notes in Computer 
Science, Springer (2004) 135-141 

19. Sparck-Jones, K.: Automatic summarizing: factors and directions. In Mani, I.; Maybury, 
M. Advances in Automatic Text Summarization. The MIT Press (1999) 1-12 

20. Salton. G., Buckley, C.: Term-weighting approaches in automatic text retrieval. 

Information Processing and Management 24 (1988) 513-523 

21. Witten, I.H., Frank, B.: Data Mining: Pratical Machine Learning Tools and Techniques 
with Java Implementations. Morgan Kaufmann, San Francisco (1999) 

22. Zhong, N., Liu, J., Yao, Y.: In search of the wisdom web. IEEE Computer 35(1) (2002) 
27-31 




Coordination Revisited — A Constraint Handling 

Rule Approach 



Dulce Aguilar-Solis and Veronica Dahl 



Logic and Functional Programming Group, 
Department of Computing Science, 
Simon Fraser University, 

Burnaby, B.C., Canada 
{dma, veronica}@cs.sfu.ca 



Abstract. Coordination in natural language (as in “Tom and Jerry”, 
“John built but Mary painted the cupboard”, “publish or perish”) is 
one of the most difficult computational linguistic problems. Not only 
can it affect any type of constituent, but it often involves “guessing” 
material left implicit. We introduce a CHR methodology for extending 
a user’s grammar not including coordination with a metagrammatical 
treatment of same category coordination. Our methodology relies on the 
input grammar describing its semantics compositionally. It involves reify- 
ing grammar symbols into arguments of a generic symbol constituent, 
and adding three more rules to the user’s grammar. These three rules 
can coordinate practically any kind of constituent while reconstructing 
any missing material at the appropriate point. With respect to previous 
work, this is powerfully laconic as well as surprisingly minimal in the 
transformations and overhead required. 



1 Introduction 

The present work forms part of our wider efforts to make spoken communication 
with computers a closer goal. We take inspiration from database theory (in par- 
ticular, Datalog) and from constraint reasoning (in particular, CHRs) because 
the inherent ambiguity of natural language, and the frequency of errors that 
appear particularly in spoken or colloquial input, suggests bottom-up parsing 
techniques as the most promising. We should be able to gather, from an attempt 
to produce a correct parse, information which does not necessarily conform to 
strict grammar rules, and if possible, interpret that information in ways which 
allow us to somehow make sense of this “imperfect” input, just as humans do. 
Coordination adds one more level of difficulty, often involving implicit informa- 
tion that needs to be reconstructed from other conjoints (as in “The workshop 
was interesting and the talks, amusing”, in which the verb of the second conjoint 
(were) is implicit. 

The advantages of bottom-up approaches to NL processing aimed at flexibil- 
ity have been demonstrated within declarative paradigms for instance for parsing 
incorrect input and for detecting and correcting grammatical errors [2, 5, 6, 17]. 
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Most of these approaches use constraint reasoning of some sort, sometimes 
blended with abductive criteria. For the specific problem of coordination in nat- 
ural language, datalog approaches with constraints placed upon word boundaries 
have shown to be particularly adequate [11,22]. 

CHRs [15] are of great interest to datalog inspired treatments of NL because, 
as shown in [6], the datalog part requires very little implementation machinery 
when using CHRGs: basically, grammar rules in CHRGs can be made to work 
either top-down or bottom-up according to the order in which their left and right 
hand sides are given, and CHRGs include an automatic insertion and handling 
of word boundaries [7, 9]. 

In this article we investigate the possibilities of blending CHRGs with datalog 
grammars, in view of a metagrammatical treatment of coordination. In our ap- 
proach, the use of CHRGs together with a compositional meaning construction 
scheme is sufficient to understand (in the sense of machine understanding, i.e. , to 
produce adequate meaning representations from) coordinated sentences, using a 
grammar which makes no explicit mention of coordination (other than declaring 
in its lexicons which are the allowed conjunctions). We choose a lambda-calculus 
based meaning representation to exemplify. 

As in [2] or [5], we can build non-connected structures showing all partial 
successful analyses (but without having to replace trees by graphs as in that 
work) . The result is a surprisingly simple approach, in which datalog grammars 
are given “for free” in the normal operation of CHRGs in conjunction with our 
transformation of the grammar given. 



2 Background 

2.1 Previous Approaches to Coordination 

Typically, treatments of coordination involve either enriching the linguistic rep- 
resentation using explicit rules of grammar or adding special mechanism to the 
parsing algorithms to handle conjunction. The latter kind of approach has been 
used by Woods[23], Dahl and McCord[13], Haugeneder[16] and Milward[19]; 
among others. 

In recent work ( [11]), Dahl provided a left-corner plus charting datalog ap- 
proach which can determine the parallel structures automatically, taking into 
account both syntactic and semantic criteria. This methodology records parse 
state constituents through linear assumptions to be consumed as the correspond- 
ing constituents materialize throughout the computation. Parsing state symbols 
corresponding to implicit structures remain as undischarged assumptions, rather 
than blocking the computation as they would if they were subgoals in a query. 
They can then be used to glean the meaning of elided structures, with the aid 
of parallel structures. 

While being quite minimalistic in the amount of code required, this approach 
involves sophisticated process synchronization notions, as well as the use of linear 
affine implication. In the present work we show how to retain the advantages of 
[11] while drastically simplifying our methodology. 
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2.2 Constraint Handling Rules 

CHR [15] are a committed-choice language for writing constraint solvers to be 
used with Prolog programs within domains such as real or integer numbers, 
as well as within less traditional domains, such as such as terminological and 
temporal reasoning. They are incorporated as an extension of (among others) 
SICStus Prolog 1 . 

As summarized in the CHR website 2 , “CHR consist essentially of guarded 
rules that rewrite constraints into simpler ones until all the constraints have 
been solved. They define both simplification of and propagation over constraints. 
Simplification replaces constraints by simpler constraints while preserving logical 
equivalence (e.g. X > Y, Y > X <=> fail). Propagation adds new constraints 
which are logically redundant but may cause further simplification (e.g. X > 
Y,Y > Z ==> X > Z). Repeatedly applying CHR incrementally simplifies and 
finally solves constraints (e.g. A > B, B > C,C > A leads to fail).” 

CHR works on constraint stores with its rules interpreted as rewrite rules over 
such stores. A string to be analyzed such as “the sun shines” is entered as a se- 
quence of constraints {token (0, 1 , the) , token (1 ,2 , sun) , token (2 , 3 , shines) } that 
comprise an initial constraint store. The integer arguments represent word bound- 
aries, and a grammar for this intended language can be expressed in CHR as follows . 

token (X0 , XI , the) ==> det (X0.X1 , _) . 
token (X0 , XI , sun) ==> n(X0 ,X1 , sing) . 
token(X0 , XI , shines) ==> v(X0 ,X1 , sing) . 
n(X0,Xl ,Num) , v(Xl ,X2 ,Num) ==> s (X0.X1 ,Num) . 

It is to be noted that if the application of a rule adds a constraint c to the 
store which already is there, no additional rules are triggered, e.g., p==>p does 
not loop as it is not applied in a state including p. 

CHR applications to natural language processing include [1], which flexibly 
combines top-down and bottom-up computation in CHR, [6] , which uses CHR to 
diagnose and correct grammatical errors, and [10], which implements property 
grammars using CHR. 

2.3 Constraint Handling Rule Grammars 

CHRGs, or Constraint Handling Rule Grammars, were developed by H. Chris- 
tiansen 3 as a grammatical counterpart for CHRs. A CHRG consists of finite sets 
of grammar and constraints symbols and a finite set of grammar rules. A gram- 
mar symbol , is formed as a logical atom whose predicate symbol is a grammar 
symbol; a grammar symbol formed by token/ 1 is called a terminal , any other 
grammar symbol a nonterminal. A propagation (grammar) rule is of the form 

a/3 7 ==> G\8. 



1 http:/ /www. sics.se/isl/sicstuswww/site/documentation.htrnl 

2 http://www.informatik.uni-ulm.de/pm/mitarbeiter/fruehwirth/chr.html 

3 http://www.dat.ruc.dk/ henning/chrg/ 
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The part of the rule preceding the arrow is called the head , G the guard , and 
6 the body; a, (3, 7, b are sequences of grammar symbols and constraints so that 
(3 contains at least one grammar symbol, and 6 contains exactly one grammar 
symbol which is a nonterminal (and perhaps constraints); a (7) is called left, 
(right) context and (3 the core of the head; G is a guard as in CHR that may 
test properties for the variables in the head of the rule. If either the left or the 
right context is empty, the corresponding marker is left out and if G is empty 
(interpreted as true), the vertical bar is left out. The convention from DCG 
is adopted that non-grammatical constraints in head and body of a rule are 
enclosed in curly brackets. 

CHRG rules can be combined with rules of CHR and with Prolog, which is 
convenient for defining the behaviour of non-grammatical constraints. CHRG in- 
cludes also notation for gaps and for parallel matching, which we neither describe 
nor use in the present work. 

The CHRG notation makes the word boundary arguments implicit and, anal- 
ogously to DCGs, includes a syntax for using nongrammatical constraints. 

As demonstrated by [7, 8] , CHRG is a very powerful system which provides 
straightforward implementation of assumption grammars [12], abductive lan- 
guage interpretation and a flexible way to handle interesting linguistic phenom- 
ena such as anaphora and a restricted form of coordination (specifically, for 
coordinating subject and object proper names). This work motivated us to try 
virtual coordination through CHRGs at the general level. 

3 Our Approach to Coordination 

3.1 General Description 

Our methodology imposes two requirements on the user’s grammar. First, se- 
mantics must be defined compositionally. Second, semantic material must be 
isolated into one specific argument, so that the rules for virtual coordination can 
easily identify and process it. For instance, if we use the second argument for 
semantics, a rule such as 

name(X) -> np(X"Sem“Sem) . 

should be coded in CHR and CHRG as 

CHR: category(name,X,PO,Pl) ==> constituent (np ,X~SenrSem,PO , PI) . 

CHRG: category (name, X) ::> constituent (np,X"Sem~Sem) . 

We assume that there are two (implicit or explicit) coordinating constituents, 
Cl and C2, surrounding the conjunction, which must in general be of the same 
category 4 . As in Dahl’s previous work [11], we adopt the heuristics that closer 
scoped coordinations will be attempted before larger scoped ones. Thus in Woods’ 
well-known example[23], “John drove his car through and demolished a window”, 
"vp conj vp" is tried before "sent conj sent" . 



4 This is a simplifying assumption: coordination can involve different categories as 
well, but in this paper we only address same category coordination. 
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If both Cl and C2 are explicit, we simply postulate another constituent of 
the same category covering both, and conjoin their meanings. This is achieved 
through the rule: 

constituent (C, Semi ,P0,P1) , 
constituent (conj , _ , PI ,P2) , 
constituent (C, Sem2,P2 ,P3) ==> 

constituent (C ,Sem,P0 ,P3) , conj (Semi , Sem2 ,Sem) . 

Sem can take either the form and (Semi ,Sem2) or a more complex form. 

If either Cl or C2 is implicit, the CHRG engine will derive all possible partial 
analyses and stop with no complete sentence having been parsed. We can then 
resume the process after dynamically adding the following symmetric rules, in 
charge of completing the target in parallel with the source. Once the target has 
been completed, the above rule for coordinating complete constituents can take 
over. 

"/, The second conjoint is incomplete 
constituent (C, Semi ,P0,P1) , 
constituent (conj ,_ ,P1 ,P2) ==> 

complete (C ,Seml .range (PO ,P1) ,Sem2 ,P3) 

I constituent (C ,Sem2 ,P2,P3) . 

‘/, The first conjoint is incomplete 
constituent (conj ,P1 ,P2) , 
constituent (C, Sem2,P2 ,P3) ==> 

complete (C , Sem2 , range (P2 , P3) , Semi , PO) . 

I constituent (C , Semi ,P0, PI) . 

complete/5 generates the features of a constituent of category C that is incom- 
plete between the given points, using the source (the portion of text indicated 
by range/2) as parallel structure. The new constraint is left in the constraint 
store, so that the rule for complete conjoint coordination can apply. 

3.2 An Example 

For Philippe likes salsa and Stephane tango , we have the initial constraint store: 
{philippe (0, 1) , likes (1 ,2) , salsa (2 ,3) , and (3, 4) , stephane (4,5) .tango (5,6)} to 
which the following “constraints” (in the sense of the CHR store) are success- 
ively added: [ name(0 , 1) , verb(l , 2) .noun (2, 3) , conj (3,4) .name (4,5) .noun (5 ,6) , 
np(0,l) ,vp(l,3) ,np(2,3) ,np(4,5) ,np(5,6) ,s(0,3) ] . 

At this point, since the analysis of the entire sentence is not complete, the 
dynamically added rules will compare the data to the left and right of the con- 
junction, noting the parallelisms present and absent: 

np(0,l) parallel to np(4,5) 
verb(l,2) parallel to ? 
np(2,3) parallel to np(5,6) 

and postulate a verb(5,5), with the same surface form (“likes”) as the verb in 
the parallel structure. The addition of this element to the constraint store triggers 
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the further inferences: {vp(5,6) , s(4,6)}. This in turn will trigger the complete 
constituent coordination rule, resulting in {s(0,6)>. 

3.3 Top-Down Prediction 

For cases in which the two parallel structures are different, the simple pairing of 
constituents in the source and target, as above, will not be enough. For Woods’ 
example, “John drove a car through and demolished a window”, we have the con- 
straint store: 

{ john(0,l), drove(l, 2), a(2,3), car(3,4), 
through(4,5) , and(5,6), demolished(6 , 7) , 
a(7,8), window(8,9), name(0,l), verb(l,2), 
det(2,3), noun(3,4), prep(4,5), conj(5,6), 
verb(6,7), det(7,8), noun(8,9), np(0,l), 
np(7,9) , vp(6 , 9) } 

In such cases we examine the grammar rules in top-down fashion to determine 
which ones would warrant the completion of a constituent that appears to one 
side of the conjunction but not to the other. 

In our example, the candidate sources are: prep(4,5), verb(6,7) and vp(6,9). 
Postulating a missing prep at point 6 or a missing verb at point 5 does not lead 
to success, so we postulate a vp ending in point 5 and use vp rules top-down to 
predict any missing constituents. The pp rule is eventually found, which rewrites 
pp into a preposition plus a noun phrase, so we can postulate a missing noun 
phrase between point 5 and itself, to be filled in by the parallel noun phrase in 
the source, namely “a window” (we must avoid requantification, so the window 
driven through and the one demolished must be the same window) . This new noun 
phrase triggers in turn the inference of a verb phrase between points 1 and 5, 
which in turn allows us to conjoin the two verb phrases, and complete the analysis. 

3.4 Semantics 

We now turn our attention to grammars which construct meaning representa- 
tions for the sentences analysed. 

After having determined the parallel elements in source and target, we must 
void the source’s meaning (using for instance lrigher-orcler unification) from those 
elements which are not shared with the target, and only then apply the resulting 
property on the representation of the target. 

For our example, we must from the meaning representation of “Philippe likes 
salsa" reconstruct the more abstract property [Ay. Ax. likes(x,y)], which can then 
be applied on “tango” and “stephane” to yield likes (stephane, tango). 

We bypass this need by keeping copies of the abstract properties as we 
go along. While the original properties get instantiated during the analysis 
(so that the meaning of “likes” in the context “Philippe likes salsa” becomes 
likes (philippe, salsa)), their copies do not. It is these uninstantiated copies 
that are used as meanings for the reconstructed targets. 
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3.5 Syntactic Considerations 

Some of the syntactic features carried around by typical grammars need a special 
treatment by the coordination rules: the conjunction of two singular noun phrases 
(e.g. “the cat and the dog”), for instance, should result in a plural conjoined noun 
phrase. Our system takes this into account. 

4 Related and FutureWork 

Various authors, e.g. [18] discuss an alternative approach to anaphoric dependen- 
cies in ellipsis, in which the dependence between missing elements in a target 
clause and explicit elements in a source clause does not follow from some uni- 
form relation between the two clauses, but follows indirectly from independently 
motivated discourse principles governing pronominal reference. While contain- 
ing linguistically deep discussions, the literature on this discourse-determined 
analysis also focusses on ellipsis resolution, while still leaving unresolved (to the 
best of our knowledge) the problem of automatically determining which are the 
parallel structures. 

On another line of research, Steedman’s CCGs [21] provide an elegant treat- 
ment of a wide range of syntactic phenomena, including coordination, which 
does not resort to the notions of movement and empty categories, instead using 
limited combinatory rules such as type raising and functional composition . How- 
ever, these are also well known to increase the complexity of parsing, originating 
spurious ambiguity- that is, the production of many irrelevant syntactic analyses 
as well as the relevant ones. Extra work for getting rid of such ambiguity seems 
to be needed, e.g. as proposed in [20]. 

The results in [14] concerning the use of the distinction between primary 
and secondary occurrences of parallel elements in order to provide ambiguous 
readings of discourses such as “Jessie likes her brother. So does Hannah.” could, in 
principle, be transferred into our approach as well. 

A recent line of work in Computational Linguistics - the property based 
paradigm [3, 4]- departs from the Chomskyan, hierarchical approach to parsing 
in that it accepts input through checking properties between any two words or 
constituents modularly, and relaxing some of the properties to accomodate ill- 
formed or incomplete input. Properties are statically declared as either necessary 
or desirable. In this approach, a sentence such as “The workshops was interesting” 
may be accepted, while producing an indication that the property of number 
agreement between the noun phrase and the verb is not satisfied (as we might 
want for instance in a language tutoring system). Thus it is not necessary to have 
a complete parse tree to get interesting results: incomplete or incorrect input will 
yield partial results, and the users’ static indications of which properties can be 
relaxed and which should be enforced will result in indications by the system of 
which properties are not satisfied for a given input. 

A CHR methodology for parsing in the property based paradigm has been 
shown in [10], which proceeds from head words and projects them into phrases 
(e.g., a noun can form a noun phrase all by itself). Then it tries to incorporate 
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more constituents into this phrase, one at a time, by testing properties between 
the constructed phrase and the potential addition. This results in basically a 
one-rule parser, whose single rule successively takes two categories, checks the 
properties between them, and constructs a new category. This process repeats 
until no more new categories can be inferred. We are currently working on ex- 
tending this parser with coordinating abilities as described in this paper, both 
as a further proof of concept of the ideas here developed, and as an interesting 
approach to treating long distance dependencies in property based grammars. 

5 Concluding Remarks 

We have shown how to take advantage of CHRGs’ built-in datalog facilities and 
constraint store management in order to treat metagrammatical coordination in 
a relatively simple while effective and encompassing manner. 

A few observations are in order: as we have seen, a simple conjoining of the 
representations obtained for the parallel structures as proposed in [14] may not 
suffice. Since these structures may be quite dissimilar, we must conjoin only the 
parallel elements. We postulate that, in compositionally defined semantics, the 
parallel elements will be represented by those subterms which are not unihable. 

Another important observation is that we do not need to commit to higher- 
order unification, property reconstructions, etc. Again for compositionally de- 
fined semantics, the parallel structures analysis together with top-down target 
determination ensures that the correct meanings are associated to the elided 
constituents as a side effect of parsing. 

Lastly, we should note that our analysis allows for the source clause to not 
necessarily be the first one- as we have also seen, we can have structures in 
which the incomplete substructure does not antecede the complete one. Thus 
our analysis can handle more cases than those in the previous related work. 
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Appendix: A Sample Grammar 

compile (chrg) . 
handler parser. 
grammar_symbols c/3. 

c(C, FI, X) ,c(con, [] ,Con) ,c(C,F2,Y) ::> 

conjunction_agree(Con,C,Fl,F2,F) , collapse(C,X,Y,Z) 

I c(C,F, Z). 

7. The second conjoint is incomplete 
c(C,T,Seml) : (P0,P1) , c(con,_,_) : (P1,P2) ::> 

completar (C , T , Semi , range (PO , PI ) , Sem2 , P3 , Ts) 

I c(C,Ts,Sem2) : (P2,P3) . 

7» The first conjoint is incomplete 
c(con,_,_) : (P1,P2) , c(C,T,Seml) : (P2,P3) , ::> 

completar (C , T , Semi , range (P2 , P3) , Sem2 , PO , Ts) 

I c(C,Ts,Sem2) : (P0,P1) . 

c(np, [num:N] ,X~Sco~Sem) , c(vp, [ten:Tl,num:Nl] ,Xl~Scol) ::> 

apply ((N=N1) ,N,NewN) , apply ( (X=X1 ,Sco=Scol) ,Sem,NewSem) 
I c(s, [num:NewN] , NewSem) . 

c(tv, [ten:T,num:N] ,Sem) ,c(np, [num:_] , Yl~Scol~Pred) , c(pp,Prep,NP) : 
add_pp(Prep,Sem,NP,Y~X~Sco) , 
apply ( (Y=Y1 , Sco=Scol) ,X~Pred, NewSem) 

I c(vp, [ten: T,num:N] , NewSem) . 

c(tv, [ten:T,num:N] ,Y~X~Sco) ,c(np, [num: _] , Yl~Scol~Pred) ::> 
apply ( (Y=Y1 , Sco=Scol) ,X~Pred, NewSem) 

I c(vp, [ten: T,num:N] , NewSem) . 

c(iv,F,Sem), c(pp,Prep,NP) ::> 

add_pp (Prep , Sem , NP , NewSem) 

I c(vp,F, NewSem) . 

c(iv,F,Sem) ::> c(vp,F,Sem). 

c(name,[num: N] ,X) ::> c(np,[num: N] ,X~Sem~Sem) . 
c(det,[num: N] ,X~Res~Scope~Sem) ,c(noun, [num:Nl] ,Xl~Resl) ::> 
apply ((N=N1) ,N,NewN) , 

apply( (X=X1 ,Res=Resl) ,X~Scope~Sem,NPsemanticTerm) 

I c(np, [num:NewN] ,NPsemanticTerm) . 

c (prep, Prep) , c(np,_,NP) ::> c(pp,Prep,NP) . 
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Abstract. Question Answering has become a promising research field whose 
aim is to provide more natural access to the information than traditional docu- 
ment retrieval techniques. In this work, an approach centered in the use of con- 
text at a lexical level has been followed in order to identify possible answers to 
short factoid questions stated by the user in natural language. The methods ap- 
plied at different stages of the system as well as an architecture for question an- 
swering are described. The evaluation of this approach was made following 
QA@CLEF03 criteria on a corpus of over 200,000 news in Spanish. The paper 
shows and discusses the results achieved by the system. 

Keywords: Question Answering, Automatic Text Processing, Natural Language 
Processing. 



1 Introduction 

Question Answering (QA) systems has become an alternative to traditional informa- 
tion retrieval systems because of its capability to provide concise answers to questions 
stated by the user in natural language. This fact, along with the inclusion of QA 
evaluation as part of the Text Retrieval Conference (TREC) 1 in 1999, and recently [6] 
in Multilingual Question Answering as part of the Cross Language Evaluation Forum 
(CLEF) 2 , have arisen a promising and increasing research field. 

Nowadays, the state of the art on QA systems is focused in the resolution of factual 
questions [2, 14] that require a named entity (date, quantity, proper noun, locality, etc) 
as response. For instance, the question “ i Cudndo decidid Naciones Unidas imponer 
el embargo sobre Irak?” 3 demands as answer a date “en agosto de 1990” 4 . Several 
approaches of QA systems like [8, 13, 4, 10] use named entities at different stages of 



T This work was done while visiting the Dept, of Information Systems and Computation Poly- 
technic University of Valencia, Spain. 

1 http://trec.nist.gov/ 

2 http://clef-qa.itc.it/ 

3 When did the United Nations decide to impose the embargo on Iraq? 

4 In August 1990. 

C. Lemattre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315. pp. 325-333, 2004. 
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the system in order to find a candidate answer. Generally speaking, the use of named 
entities is performed at the final stages of the system, i.e., either in the passage selec- 
tion or as a discriminator in order to select a candidate answer at the final stage. 
Another interesting approach is the use of Predictive Annotation which was first pre- 
sented at TREC-8 by Prager et al. [8], One meaningful characteristic of this approach 
is the indexing of anticipated semantic types, identifying the semantic type of the 
answer sought by the question, and extracting the best matching entity in candidate 
answer passages. In their approach, the authors used no more than simple pattern 
matching to get the entities. The system described in this document was developed to 
process both, questions and source documents in Spanish. Our system is based on 
approach just described but differs in the following: i) the identification of the seman- 
tic classes relies in the preprocessing of the whole document collection by a POS 
tagger that simultaneously works as named entity recognizer and classifier, ii) the 
indexing stage takes as item the lexical context associated to each single named entity 
contained in every document of the collection, iii) the searching stage selects as can- 
didate answers those named entities whose lexical contexts match better the context 
of the question, iv) at the final stage, candidate answers are compared against a sec- 
ond set of candidates gathered from the Internet, v) Final answers are selected based 
on a set of relevance measures which encompass all the information collected in the 
searching process. The evaluation of the system was made following the methodology 
and data set of QA@CLEF-2003 [6] in order to get a comparable evaluation with 
other systems designed for Spanish language. 

The rest of this paper is organized as follows; section two describes the architec- 
ture and functionality of the system; section three details the process of question proc- 
essing; section four details the process of indexing; section five shows the process of 
searching; section six describe the process of answer selection; section seven dis- 
cusses the results achieved by the system; and finally section eight exposes our con- 
clusions and discusses further work. 

2 System Overview 

The system adjusts to a typical QA system architecture [14]. Figure 1 shows the main 
blocks of the system. The system could be divided into the following stages: question 
processing, which involves the extraction of named entities and lexical context in the 
question, as well as question classification to define the semantic class of the answer 
expected to respond to the question; indexing, where a preprocessing of the support- 
ing document collection is done, building the representation of each document that 
become the searching space to find candidate answers to the question; searching, 
where a set of candidate answers is obtained from the index and the Internet, (here 
candidate answers are classified by a machine learning algorithm, and provides in- 
formation to perform different weighting schemes); and finally answer selection 
where candidate answers are ranked and the final answer recommendation of the 
system is produced. Next sections describe each of these stages. 
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Fig. 1 . Block diagram of the system. There are four stages: question processing, indexing, 
searching and answer selection 



3 Question Processing 

MACO [3] is a POS tagger and lemmatizer capable of recognizing and classifying 
named entities (NEs). The possible categories for NEs are the following: person, 
organization, geographic place, date, quantity and miscellaneous. In order to reduce 
the possible candidate answers provided by our system we perform a question classi- 
fication process. The purpose of this classification is to match each question with one 
of the six named entities provided by MACO. 

We use a straightforward approach, where the attributes for the learning task are 
the prefixes of the words in the question and additional information acquired by an 
Internet search engine. 

The procedure for gathering this information from Internet is first we use a set of 
heuristics in order to extract from the question the first noun word or words w. We 
then employ a search engine, in this case Google, submitting queries using the word w 
in combination with the five possible semantic classes. For instance, for the question 
Who is the President of the French Republic? President is extracted as the noun in the 
question using our heuristics, and run 5 queries in the search engine, one for each 
possible class. The queries take the following forms: 
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• “President is a person" 

• “President is a place” 

• “President is a date” 

• “President is a measure” 

• “President is an organization” 

For each query (qj the heuristic takes the number of results ( CrJ returned by 
Google and normalizes them according to equation 1 . This means that for each ques- 
tion, the summatory of their five performed queries is 1. Normalized values (Iw(qJ) 
are taken as attributes values for the learning algorithm. As it can be seen is a very 
direct approach, but experimental evaluations showed that this information gathered 
from Internet is quite useful [11]. 

The machine learning technique used was Support Vector Machines [12] imple- 
mented in WEKA [15]. " 

lMfIi)=Cr i /'2 d Cr i Equation 1. 

/ i=0 



4 Indexing 

Each document in the collection is modeled by the system as a factual text object whose 
content refers to several named entities even when it is focused on a central topic. As 
mentioned, named entities could be one of these objects: persons, organizations, loca- 
tions, dates and quantities. The model assumes that the named entities are strongly re- 
lated to their lexical context, especially to nouns (subjects) and verbs (actions). Thus, a 
document can be seen as a set of entities and their contexts. For details about the docu- 
ment model we refer the reader to [7]. In order to obtain the representation of the docu- 
ments, the system begins preprocessing each document with MACO, where this process 
is performed off-line. Once the document collection has been tagged, the system ex- 
tracts the lexical contexts associated to named entities. The context considered for this 
experiment consists of the four verbs or nouns, both at the left and right of its corre- 
sponding NE. The final step in the indexing stage is the storage of the extracted con- 
texts, populating a relational database 5 which preserves several relations between each 
named entity, its semantic class, associated contexts, and the documents where they 
appeared. In other words, the index is an adaptation of the well knows inverted file 
structure used in several information retrieval systems. Given the information required 
by the system, the indexing and searching modules were developed from scratch. 

5 Searching 

The search engine developed for the system and the searching process differ in sev- 
eral aspects from traditional search engines. This process relies on two information 
sources: first the information gathered from question processing, i.e., the expected 



5 Due to performance constraints, the index has been distributed over a cluster of 5 CPUs. 
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semantic class of the answer to the question, and the named entities and lexical con- 
text of the question; and second, the index of named entities, contexts and documents 
created during indexing. 

5.1 Searching Algorithm 

With the document representation, all the name entities mentioned in a given docu- 
ment can be known beforehand. Thus, the name entities from the question become 
key elements in order to define the document set more likely to provide the answer. 
For instance, in the question “ Cud l es el nombre del presidente de Mexico?” 6 , the 
named entity “Mexico” narrows the set of documents to only those containing such 
name entity. At the same time, another assumption is that the context in the neighbor- 
hood of the answer has to be similar to the lexical context of the question. Once more, 
from the question of the example, the fragment “even before his inauguration as 
president of Mexico. Vicente Fox ...” contains a lexical context next to the answer 
which is similar to that of the question. 

Following is the algorithm in detail: 

1. Identify the set of relevant documents according to the named entities in the 
question. 

2. Retrieve all contexts in each relevant document. 

3. Compute the similarity between question context and those obtained in step 2. 

3.1. Preserve only those contexts whose associated named entity corresponds 
to the semantic class of the question. 

3.2. Compute a similarity function based on frequencies to perform further 
ranking and answer selection. 

4. Rank the candidate named entities in decreasing order of similarity. 

5. Store similarity and named entity classification information (step 3.2) for next 
stage. 



6 Answer Selection 

Analyzing the output from the local index we find out that we had a lot of possible 
answers with the same values for similarity and named entity classification informa- 
tion. Thus, we develop a method for selecting the final possible answer based on 
answers retrieved from Internet and automated classification of answers using a 
bagged ensemble of J48 [15]. 

The final answer presented by our system was selected by calculating the intersec- 
tion among words between the local index candidate answers and the answers 
provided by the Internet search. We consider the candidate answer with highest inter- 
section value to be more likely to be the correct answer. However, in some cases all 
the candidate answers have the same intersection values. In this case we selected from 
the candidates the first one classified by the learning algorithm as belonging to the 
positive class. When no positive answer was found among the candidates for a ques- 
tion, then we selected the first candidate answer with highest value from the 
local index. 



6 What is the name of the president of Mexico? 
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The following sections briefly describe the Internet search and the answer classifi- 
cation processes. 

6.1 Internet Searching 

As mention earlier, at the final stage, the system uses information from the Internet in 
order to get more evidence of the possible accuracy of each candidate answer. From 
the perspective of the overall system, Internet searching occurs simultaneously to the 
local search. This subsection reviews the process involved in such task. 

The module used at this step was originally developed at our laboratory to research 
the effectiveness of a statistical approach to web question answering in Spanish. Such 
approach lies in the concept of redundancy in the web, i.e, the module applies a sev- 
eral transformations in order to convert the question into a typical query and then this 
query along to some query reformulations are sent to a search engine with the hy- 
pothesis that the answer would be contained -several times- in the snippets retrieved 
by the search engine 7 . The selection of candidate answers from Internet is based on 
computing all the n-grams, from unigrams to pentagrams, as possible answers to the 
given question. Then, using some statistical criteria the n-grams are ranked by de- 
creasing likelihood of being the correct answer. The top ten are used to validate the 
candidates gathered from the local searching process. 

6.2 Answer Classification 

Discriminating among possible answers was posed as a learning problem. Our goal 
was to train a learning algorithm capable of selecting from a set of possible candidates 
the answer that most likely satisfies the question. We selected as features the values 
computed by the local indexing. We use five attributes: 1) the number of times the 
possible answer was labeled as the entity class of the question; 2) the number of times 
the possible entity appeared labeled as a different entity class; 3) number of words in 
common in the context of the possible answer and the context of the question, exclud- 
ing named entities; 4) the number of entities that matched the entities in the question, 
and 5) the frequency of the possible answer along the whole collection of documents. 
With these attributes, we then trained a bagged ensemble of classifiers using as base 
learning algorithm the rule induction algorithm 148 [9]. 

In this work we build the ensemble using the bagging technique which consists of 
manipulating the training set [ 1 ] . 

Given that we had available only one small set of questions, we evaluate the classi- 
fication process in two parts. We divided the set of questions into two subgroups of 
the same size and performed two runs. In each run, we trained on one half and tested 
on the other. 



7 System Evaluation 

The evaluation of the system was made following the methodology used in the past 
QA track at CLEF-2003 [6]. Following, the criteria used in this track is summarized. 



7 The search engine used by this module is Google (http://www.google.com). 
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The document collection used was EFE94, provided by the Spanish news agency 
EFE. The collection contains a total of 215,738 documents (509 MB). The question 
set is formed by 200 questions; and 20 have no answer in the document set. For such 
questions the system has to answer with the string NIL. Answers were judged to be 
incorrect (W) when the answer-string did not contain the answer or when the answer 
was not responsive. In contrast, a response was considered to be correct (R) when the 
answer string consisted of nothing more than the exact, minimal answer and when the 
document returned supported the response. Unsupported answers (U) were correct but 
it was impossible to infer that they were responsive from the retrieved document. 
Answers were judged as non-exact (X) when the answer was correct and supported by 
the document, but the answer string missed bits of the response or contained more 
than just the exact answer. In strict evaluation, only correct answers (R) scored points, 
while in lenient evaluation the unsupported responses (U) were considered to be 
correct, too. 

The score of each question was the reciprocal of the rank for the first answer 
to be judged correct (1 or 0, or 0.333, or 0.5 points), depending on the confidence 
ranking. The basic evaluation measure is the Mean Reciprocal Rank (MRR) 
that represents the mean score over all questions. MRR takes into consideration 
both recall and precision of the systems’ performance, and can range between 
0 (no correct responses) and 1 (all the 200 queries have a correct answer at 
position one). 

7.1 Results 

Table 1 shows the results gathered by our system, the total of questions correctly 
answered is 85, which represents a 42.5% of the question set. It is important to remark 
that 87% of the answers are given as first candidate for the system. 



Table 1. Results gathered from the system after processing the QA@CLEF-2003 question set 



Rank 


1“ 


2 nd 


3 rd 


Number of correct answers 


74 


9 


2 


Total of correct answers 


85 (42.5%) 


Mean Reciprocal Rank 


0.3958 



Table 2 shows the comparative results between the best run (Aiicex031ms) pre- 
sented last year in the QA monolingual task for Spanish [13] and the results gathered 
by our system in this work (Inaoe). 

Table 2. Results from QA@CLEF-2003 monolingual task and our system 





Strict 


Lenient j 


Run 


MRR 


Correct 


MRR 


Correct 


Alicex031ms 


0.3075 


40.0 % 


0.3208 


43.5 % 


Inaoe 


0.3958 


42.5% 


— 


— 
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Given the approach followed by our system it is unable to evaluate it under lenient 
parameters, i.e, the systems provides as answers named entities avoiding non-exact 
(X) or unsupported (U) answers. However the MRR achieved by our approach is 
higher than both strict and lenient MRR of Alicex031ms. 



8 Conclusions 

This work has presented a lexical-context approach for QA in Spanish. Such approach 
has been evaluated on a standard test bed and demonstrated its functionality. The 
strength of this work lies in the model used for the source documents. The identifica- 
tion and annotation in advance of named entities and their associated contexts serves 
as key information in order to select possible answers to a given factoid question. On 
the other hand, the discrimination of candidate answers is a complex task that requires 
more research and experimentation of different methods. In this work we have ex- 
perimented with the merging of evidence coming from three main sources: a ranked 
list of candidate answers gathered by a similarity measure, answer classification by a 
bagged ensemble of classifiers, and a set of candidate answers gathered from the 
Internet. Further work includes exploring the inclusion of more information as part of 
the context, the refinement of the semantic classes for questions and named entities, 
and the improvement of answer selection methodology. 
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Abstract. We present a new branch and bound algorithm for Max-SAT 
which incorporates original lazy data structures, a new variable selection 
heuristics and a lower bound of better quality. We provide experimental 
evidence that our solver outperforms some of the best performing Max- 
SAT solvers on a wide range of instances. 

Keywords: Max-SAT, branch and bound, lower bound, heuristics, data 
structures. 



1 Introduction 

In recent years we have seen an increasing interest in propositional satisfia- 
bility (SAT) that has led to the development of fast and sophisticated com- 
plete SAT solvers like Chaff, Grasp, RelSat and Satz, which are based on the 
well-known Davis-Putnam-Logemann-Loveland (DPLL) procedure [5]. Given a 
Boolean CNF formula <f>, such algorithms determine whether there is a truth as- 
signment that satisfies 4>. Unfortunately, they are not able to solve a well-known 
satisfiability optimization problems: Max-SAT. Given a Boolean CNF formula <f>, 
Max-SAT consists of finding a truth assignment that maximizes the number of 
satisfied clauses in cj). When all the clauses have at most k literals per clause 
Max-SAT is called Max-fc-SAT. 

To our best knowledge, there are only three exact algorithms for Max-SAT 
that are variants of the DPLL procedure. The first was developed by Wal- 
lace feFreuder [15] (WF), the second was developed by Borchers & Furman [3] 
(BF), and the third, which is based on BF, was developed by Alsinet, Manya 
& Planes [2] (AMP). All of them are depth-first branch and bound algorithms. 
The first was implemented in Lisp, while the rest were implemented in C and 
are publicly available. There are other exact algorithms for Max-SAT, but based 
on mathematical programming techniques [4,6,8]. There are also two exact 
DPLL-based algorithms for solving Max-2-SAT: one is due to Zhang, Siren & 
Manya [16] (ZSM), and the other to Alber, Gramm & Niedermeier [1] (AGN). 

In this paper we first present a new branch and bound algorithm for Max- 
SAT which incorporates original lazy data structures, a new variable selection 
heuristic, and a lower bound of better quality. We then report on an experimen- 
tal investigation we have conducted in order to evaluate our solver on Max-SAT 
instances. The results obtained provide experimental evidence that our solver 
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outperforms some of the best performing existing Max-SAT solvers on a wide 
range of instances. 

Our new Max-SAT solver, which we call Lazy, differs from BF and AMP 
in the data structures used to represent and manipulate CNF formulas, in the 
simplification preprocessing techniques applied, in the lower bound, in the vari- 
able selection heuristic and in the incorporation of the Dominating Unit Clause 
(DUC) rule. 

2 Branch and Bound for Max-SAT 

The space of all possible assignments for a CNF formula (f> can be represented as 
a search tree, where internal nodes represent partial assignments and leaf nodes 
represent complete assignments. A branch and bound algorithm for Max-SAT ex- 
plores the search tree in a depth-first manner. At every node, the algorithm com- 
pares the number of clauses unsatisfied by the best complete assignment found 
so far — called upper bound ( UB ) — with the number of clauses unsatisfied by 
the current partial assignment (unsat) plus an underestimation of the number of 
clauses that become unsatisfied if we extend the current partial assignment into 
a complete assignment (underestimation) . The sum unsat + underestimation 
is called lower bound (LB). Obviously, if UB < LB , a better assignment can- 
not be found from this point in search. In that case, the algorithm prunes the 
subtree below the current node and backtracks to a higher level in the search 
tree. HUB > LB , it extends the current partial assignment by instantiating one 
more variable; which leads to create two branches from the current branch: the 
left branch corresponds to instantiate the new variable to false, and the right 
branch corresponds to instantiate the new variable to true. In that case, the 
formula associated with the left (right) branch is obtained from the formula of 
the current node by deleting all the clauses containing the literal -> p (p) and 
removing all the occurrences of the literal p (~<p)', i.e. , the algorithm applies the 
one-literal rule [9]. The solution to Max-SAT is the value that UB takes after 
exploring the entire search tree. 

Borclrers & Furman [3] designed and implemented a branch and bound solver 
for Max-SAT, called BF herein, that incorporates two quite significant improve- 
ments: 

— Before starting to explore the search tree, they obtain an upper bound on the 
number of unsatisfied clauses in an optimal solution using the local search 
procedure GSAT [13]. 

— When branching is done, branch and bound algorithms for Max-SAT ap- 
ply the one-literal rule (simplifying with the branching literal) instead of 
applying unit propagation as in the DPLL-style solvers for SAT. 1 If unit 
propagation is applied at each node, the algorithm can return a non-optimal 
solution. However, when the difference between the lower bound and the up- 



1 By unit propagation we mean the repeated application of the one-literal rule until a 
saturation state is reached. 
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per bound is one, unit propagation can be safely applied, because otherwise 
by fixing to false any literal of any unit clause we reach the upper bound. 
Borchers & Furman perform unit propagation in that case. 

The improvements incorporated into BF are also used in the rest of DPLL- 
based Max-SAT solvers considered in this paper (i.e. AMP, ZSM, AGN). 

The lower bound and variable selection heuristic of BF are: 

— LB bf = unsat. Note that that the number of clauses unsatisfied by the 
current partial assignment coincides with the number of empty clauses that 
the formula associated with the current partial assignment contains. In this 
elementary lower bound there is no underestimation of the number of clauses 
that become unsatisfied if we extend the current partial assignment into a 
complete assignment. 

— MOMS [12]: selects a variable among those that appear more often in clauses of 
minimum size. That is the heuristic of Borchers & Furman. Ties are broken 
by choosing the first variable in lexicographical order. 

In a previous paper [2], we incorporated two improvements into BF that led to 
significant performance improvements: a lower bound of better quality (LB amp) 
and another variable selection heuristic (JW): 

— LBamp = unsat + fnin(ic{p),ic(->p)), where < f> ' is the formula associ- 

ated with the current partial assignment, and ic(p) (ic(->p)) — inconsistency 
count of p (-i p ) — is the number of clauses that become unsatisfied if the 
current partial assignment is extended by fixing p to true (false) . Note that 
ic{p) (ic(-<p)) coincides with the number of unit clauses of <p' that contain 

~^p (p). 

If for each variable we count the number of positive and negative literals in 
unit clauses, we can know the number of unit clauses that will not be satisfied 
if the variable is instantiated to true or false. Obviously, the total number 
of unsatisfied clauses resulting from either instantiation of the variable must 
be greater than or equal to the minimum count. Moreover, the counts for 
different variables are independent, since they refer to different unit clauses. 
Hence, by summing the minimum count for all variables in unit clauses and 
adding this sum to the number of empty clauses, we calculate a lower bound 
for the number of unsatisfied clauses given the current assignment. Such a 
lower bound was considered in [15]. 

— Jeroslow-Wang ( JW) [7]: given a formula </>, for each literal l of <f> the following 
function is defined: J(l) = J2iece<i> where \C\ is the length of clause C. 
JW selects a variable p of (j) among those that maximize J(p) + J(-<p). 

Our solver AMP is basically BF with the above lower bound and variable 
selection heuristic. Figure 1 shows the pseudo-code of the skeleton of BF and 
AMP. We use the following notation: empty-clauses(^) is a function that re- 
turns the number of empty clauses in lower-bound(0) is the sum of the 
number of empty clauses in tj> plus an underestimation of the number of unsatis- 
fied clauses in the formula obtained from <f> by removing its empty clauses. In our 
case, LBbf or LBamp! ub is an upper bound of the number of unsatisfied clauses 
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Input: max-sat (<j>, ub ) : A Boolean CNF formula <j> and an upper bound ub 
1: if <f> = 0 or <j> only contains empty clauses then 
2: return empty-clauses(0) 

3: end if 

4: if lower-bound((/>) > ub then 
5: return oo 

6: end if 

7: if unsat — ub — 1 then 
8: (j) <— unit-propagation(</>) 

9: end if 

10: p <— select-variable((/>) 

11: ub <— min(«6, max-sat(</>^ p , ub)) 

12: return min(u6, max-sat(^) p , ub)) 

Output: The minimum number of clauses of <j> that can be unsatisfied 
Fig. 1. Branch and Bound for Max-SAT 

in an optimal solution. We assume that the input value is that obtained with 
GSAT; select-variable(</i) is a function that returns a variable of <f> following 
an heuristic; in our case, MOMS or JW; and (j) p (4>^ p ) is the formula obtained by 
applying the one-literal rule to <j> using the literal p (~'p). 

2.1 A New Max-SAT Solver 

Our new Max-SAT solver, which we call Lazy, differs from previous solvers in 
the data structures used to represent and manipulate CNF formulas, in the 
lower bound, in a novel variable selection heuristic, in the preprocessing of such 
formulas and in the incorporation of the Dominating Unit Clause (DUC) rule. 
Lazy was implemented in C+- K 

Data Structures. BF and AMP use adjacency lists to represent CNF formulas 
and their variable selection heuristics are dynamic. Lazy uses a static variable 
selection heuristic (defined below) that allows us to implement extremely effi- 
cient data structures for representing and manipulating CNF formulas. Our data 
structures take into account the following fact: we are only interested in knowing 
when a clause has become unit or empty. Thus, if we have a clause with four vari- 
ables, we do not perform any operation in that clause until three of the variables 
appearing in the clause have been instantiated; i.e. , we delay the evaluation of a 
clause with k variables until k — 1 variables have been instantiated. In our case, 
as we instantiate the variables using a static order, we do not have to evaluate 
a clause until the penultimate variable of the clause in the static order has been 
instantiated. 

The data structures are defined as follows: For each clause we have a pointer 
to the penultimate variable of the clause in the static order, and the clauses 
of a CNF formula are ordered by that pointer. We also have a pointer to the 
last variable of the clause. When a variable p is fixed to true (false), only the 
clauses whose penultimate variable in the static order is —<p ( p ) are evaluated. 
This approach has two advantages: the cost of backtracking is constant (we do 
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P 3 = true p 3 = false 
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Fig. 2. Lazy data structure snapshot example. The arrow L stands for the pointer to 
the last variable and the arrow P stands for the pointer to the penultimate variable 

not have to undo pointers like in adjacency lists) and, at each step, we evaluate 
a minimum number of clauses. 

For instance, suppose we have a formula with the following clauses and that 
variables are instantiated in lexicographic order: 

P 2 V ~^p 3 V ^p 6 
P5 V p 3 V pi 
~<P3 V p 2 V ~'P4 

If pi and p 2 have been instantiated to false, when we branch on p 3 = true 
we derive the unit clauses ~<pe and ~'P 4 '- and when we branch on p 3 = false we 
derive the unit clause p§. The data structure snapshot is shown in Fig. 2. 

Lower Bound. Lazy incorporates a lower bound (LBlazy) of better quality 
than LBbf and LB amp- LBlazy can be understood as LB amp extended with a 
specialization of the so-called star rule defined in [11]. The star rule states that 
if we have a clause of the form li V • • • V Ik, where li, . . . , Ik are literals, and k 
unit clauses of the form — >Zi , . . . , ~<lk, then the lower bound can be incremented 
by one. In our case, we only consider clauses of length two. For longer clauses 
the star rule did not lead to performance improvements in our experimental 
investigation. The pseudo-code of LBlazy is defined as follows [14]: 

1: LBlazy := LBamp 
2: for every clause li V I 2 G f do 
3: if — 1 Z 1 € 4> and — 1 Z 2 G <j> then 

4: LBlazy := LB L azy + 1 

5: (j) :=(/>— {h V hi^h^h} 

6: end if 

7: end for 

Variable Selection Heuristic. MOMS is a branching heuristic that selects 
a variable among those that appear more often in clauses of minimum size, 
and breaks ties by choosing the first variable in lexicographical order. We have 
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defined MOMS*, which is like MOMS but breaks ties in a different way. In 
MOMS* ties are broken by choosing the variable p with the highest weight w. 
Such a weight is defined as follows: w(p) = TI;gS p occur (0> where S p denotes 
the set of negated neighbouring literals of variable p, and occur(Z) denotes the 
number of occurrences of literal l. A literal ~^l is a negated neighbouring literal of 
variable p if l occurs in a clause that contains the literal p or the literal —>p. The 
reason behind that calculation is that any clause l\ V I 2 V Z 3 can be seen as the 
implication — >Zi A -V 2 — > Z 3 . So, the greater the product occur(-iZi) • occm^-i^), 
the higher the probability of creating unit clause 1 3 . 

Note that MOMS is used as a dynamic variable selection heuristic in BF 
while MOMS* is used as a static variable selection heuristic in Lazy. 

Formula Reduction Preprocessing. Before the search starts, the initial for- 
mula is simplified by applying the resolution rule to some binary clauses. For 
every pair of clauses p\ V P 2 and ->p\ V P 2 such that variable p\ precedes vari- 
able P 2 in the static instantiation order, Lazy reduces them to the unit clause P 2 , 
and for every pair of clauses p\ V ~<P 2 and ->p\ V ~<P 2 such that variable p\ pre- 
cedes variable P 2 in the static instantiation order, Lazy reduces them to the unit 
clause ~'P 2 - 

Dominating Unit Clause (DUC) Rule. DUC is an inference rule, defined 
in [ 11 ], that allows us to fix the truth value of a variable; i.e., it avoids to apply 
branching on that variable. DUC is defined as follows: If a CNF formula (f> has 
k occurrences of a literal p (~>p) and has at least k unit clauses of the form ->p 
(p), then the value of p can be set to false (true). 

3 Experimental Results 

We conducted an experimental investigation in order to compare the perfor- 
mance of BF, AMP, and Lazy. When dealing with Max-2-SAT instances, we also 
compare with ZSM and AGN. The experiments were performed on a 2GHz Pen- 
tium IV with 512 Mb of RAM under Linux. 

In our first experiment, we evaluated the relevance of defining lazy data struc- 
tures to get substantial performance improvements. To this end, we compared 
BF and Lazy using a simple variable selection heuristic: variables are instanti- 
ated in lexicographical order. Moreover, we removed all the improvements we 
introduced into Lazy and replaced LBlazy with LBbf- In this way, we have 
that BF and Lazy traverse the same search tree. Figure 3 shows the results ob- 
tained when solving sets of randomly generated Max-3-SAT instances with 30 
variables and a different number of clauses. Such instances were generated using 
the method described in [10]. We generated sets for 180, 240, 300, 360, 420 and 
480 clauses, where each set had 100 instances. We observe that this modified 
version of Lazy is about 5 times faster than BF when both solvers traverse the 
same search tree. 

In our second experiment, we generated sets of random Max-2-SAT instances 
with 50 and 100 variables and a different number of clauses. Such instances were 
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Fig. 3. Experimental results for 30-variable Max-3-SAT instances. Mean time (in sec- 
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Fig. 4. Experimental results for 50-variable and 100-variable Max-2-SAT instances. 
Mean time (in seconds) 



generated using the method described in [10], and each set had 100 instances. 
The results of solving such instances with BF, AMP, ZSM, AGN and Lazy are 
shown in Fig. 4. Along the horizontal axis is the number of clauses, and along 
the vertical axis is the mean and median time (in seconds) needed to solve an 
instance of a set. Notice that we use a log scale to represent run-time. Clearly, 
Lazy outperforms the rest of solvers, even ZSM and AGN that are specifically 
designed to solve Max-2-SAT instances. 

In our third experiment, we generated sets of random Max-3-SAT and Max- 
4-SAT instances with 50 variables and a different number of clauses. Such in- 
stances were generated using the method described in [10], and each set had 
100 instances. The results of solving such instances with BF, AMP, and Lazy 
are shown in Fig. 5. Again, we clearly see that Lazy provides substantial perfor- 
mance improvements. 

The performance improvements of Lazy are due to its lazy data structures, 
the simplification preprocessing techniques applied, the quality of the lower 
bound, the incorporation of the dominating unit clause rule, and the variable 
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Fig. 5. Experimental results for 50- variable Max-3-SAT (left) and Max-4-SAT (right) 
instances. Mean time (in seconds) 



selection heuristic used. We believe that our results could be further improved 
by adapting the lazy data structures defined in the paper to deal with dynamic 
variable selection heuristics. It would also be interesting to test Lazy on more 
realistic benchmarks. 
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Abstract. In order to really understand all aspects of logic-based pro- 
gram development of different semantics, it would be useful to have a 
common solid logical foundation. 

The stable semantics are based on G 3 but we show that stable se- 
mantics can be fully represented in the three valued logic of Lukasiewicz. 
We construct a particular semantics that we call L 3 -WFS wich is defined 
over general propositional theories, can be defined via three valued logic 
of Lukasiewicz. Interesting L 3 -WFS seems to satisfy most of the prin- 
ciples of a well behaved semantics. Hence we propose the three valued 
Lukasiewicz logic to model WFS, extensions of WFS, and the Stable 
semantics. 



1 Introduction 

A-Prolog (Stable Logic Programming [11] or Answer Set Programming) is the 
realization of much theoretical work on Nonmonotonic Reasoning and AI appli- 
cations of Logic Programming (LP) in the last 15 years. This is an important 
logic programming paradigm that has now broad acceptance in the community. 
Efficient software to compute answer sets and different applications to model 
real life problems justify this assertion. 

The well founded semantics is a very well known paradigm originated at the 
same time that stable semantics [20] . The main difference between STABLE and 
WFS is in the definition of the former, a guess is made and then a particular 
(2-valued) model is constructed and used to justify the guess or to reject it. 
However, in the definition of WFS, more and more atoms are declared to be 
true (or false): once a decision has been drawn, it will never be rejected. WFS 
is based on a single 3-valued intended model. 

Several authors have recognized the interest in semantics with closed behavior 
to classical logic see [5-8, 19] . They have extend the WFS semantics by putting an 
additional mechanism on top of its definition. Dix noticed that the new semantics 
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sometimes have a more serious shortcomings than WFS and hence he defined a 
set of principles where all semantics should be checked against [7]. It is worth 
to mention that such notions helped Dix to propose the concept of well behaved 
semantics. We think that is important to understand well such concept if one 
wants to follow any serious methodology for logic-based program development. 

We introduce an extension of WFS that we will call L3-WFS with the fol- 
lowing properties: 

1. It is defined based on completions (as the stable semantics) but using the 
well known three valued logic of Lukasiewicz. 

2. L3-WFS is defined for propositional theories based in basic formulas. 

3. Using the knowledge ordering ( </., see [6]), we have that WFS < L3-WFS 
and also WFS + < L 3 -WFS defined by Dix also satisfies this property. 

4. The known counter examples for the well behavior of several known ex- 
tensions of WFS (such as GWFS and EWFS) do not apply for L3-WFS. 
We conjeture that L3-WFS satisfies several of the principles given for well 
behaved semantics [2]. 

5. L 3 -WFS is different from GWFS, EWFS, WFS+. 

We expect the reader to have some familiarity with many valued logics of 
Lukasiewicz and logic programming. 

2 Background 

We consider a formal (propositional) language built from an alphabet containing: 
a denumerable set C of elements called atoms, the standard 2-place connectives 
A, V, — >, and the 1-place connective Formulas and theories are constructed as 
usual in logic. In this paper we only consider finite theories. We will later define 
other connectives, but only for temporal use. 

We define the class of basic formulas 1 recursively as follows: 

-1 a, a if a is an atom. 
a V (3 if a, (3 are basic formulas. 
a A 0 if a, j3 are basic formulas. 
a — > (3 if a, (3 are basic formulas. 

A normal program is a set of rules of the form 

A\ A ... A A m A ~ A ... A ~^A n > Aq 
where each A t for i = 0, n is an atom. 

We use the well known definition of a stratified program, see [1]. We use 
the notation bx F to denote that the formula F is provable (a theorem or 
tautology) in logic X. If T is a theory we use the symbol T bx F to denote 
bx (F\ A • • • A F n ) — » F for some formulas F. t e T. We say that a theory T is 
consistent if T I / x -L. We also introduce, if T and U are two theories, the symbol 



1 this class of formulas will be used in sections 4-6. 
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T bx U to denote that T b x F for all formulas F £ U. We will write T lb x U 
to denote the fact that (i) T is consistent and (ii) T b x 17- 

Given a class of programs C, a semantic operator Sem is a function that 
assigns to each program P £ C a set of sets of atoms M C Cp. These sets of 
atoms are usually some “preferred” two valued models of the program P each of 
them is called a Sem model of P. Sometimes, we say, that this is the scenarios 
semantics [8] . Given a scenarios semantics Sem, we define the scepticalsemantics 
of a program P as: Sem(P) = f) {M U ->o/ P}, where M = Cp \ M and ->M = 
{->a : a £ M}. Given two scenarios semantics Si and S 2 , we define: Si < S 2 if 
for every program P is true that Si(P) C S 2 (P). We can easily define S 1 = S 2 
and Si < S 2 . We say that a semantics is stronger than another one according to 
this order. 



3 Three Valued Logic of Lukasiewicz 

The Polish logician and philosopher Jan Lukasiewicz began to create systems 
ofmany- valued logic in 1920, particularly a system with a third value for “pos- 
sible” and to model in this way the modalities “it is necessary that” and “it is 
possible that” 2 . The outcome of these investigations arethe Lukasiewicz systems, 
and a series of theoretical results concerning these systems 3 . 

To construct L 3 we consider a formal (propositional) language built from 
an alphabet containing: a denumerable set C of elements called atoms, the 2- 
place connective — > L , and the 1-place connective -i L . The logical constants J_, 
T , the connectives V L , A L , and the modal operators O l and D L are defined as 
follows: 



2 Lukasiewicz tried to deal with Aristotle’s paradox of the seabattle: “Two admi- 
rals, A and B, are preparing their navies for a sea battletomorrow. The battle 
will be fought until one side is victorious. But the‘laws’ of the excluded middle 
(no third truth- value) and of noncontradiction(not both truth- values), mandate 
that one of the propositions, ‘A wins’ and‘B wins’, is true (always has been and 
ever will be) and the other is false(always has been and ever will be). Suppose ‘A 
wins’ is today true. Thenwhatever A does (or fails to do) today will make no dif- 
ference; similarly, whatever B does (or fails to do) today will make no difference: 
the outcomeis already settled. Or again, suppose ‘A wins’ is today false. Then 
nomatter what A does today (or fails to do), it will make no difference;similarly, 
no matter what B does (or fails to do), it will make no difference: the outcome 
is already settled. Thus, if propositions bear their truth-values tunelessly (or un- 
changingly and eternally), then planning, or as Aristotle put it ‘taking care’, is 
illusory in its efficacy. The future will be what it will be, irrespective of our plan- 
ning, intentions, etc.” (For more references about this paradox see: Taylor, Richard 
(1957). “The problem of future contingencies.” Philosophical Review 66: 1-28, or 
visit http : / fwww2.msstate.edu/ ~ jjs87f SO /1093/ freedom.html) 

3 http : / fen.wikipedia.org/wiki/Multi — valued logic 
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Formulas and theories are constructed as usual in logic. 

A tri- valued valuation v for £ is a map v : £ — > {0, 1, 2} inductively extended 
over the set of formulas into {0,1,2} as follows: 

- u(_L) = 0 

- u(-' L a) = 2 — v(a) 

- v(a — (3) = min{ 2, 2 — v(a) + v{(3)} 

Then in L3 for the valuation defined tautologies will be the formulas whose 
truth value is 2. 

In [14] a syntactic characterization of the modal content of L 3 is studied and 
the behavior of modal operators are checked against some of the relevant modal 
principles. Minari also studies La’s axiomatization is as well as its relation with 
modal logics, particularly with S5. An axiomatization for L3 over C is given by 
the axiom schemes: 

- (Li) a (/3 -> L a) 

- (L 2 ) ( a (3) A ((/3 ->l 7) (a 7)) 

- (L3) (-' L /3 — ^ L a) — > L (a — > L / 3 ) 

- (L 4 ) ((a — s- L ^ L a) -> L a) -> L a) 

and the inference rule Modus ponens 

- (MP) If a, a — /3 € L 3 then (3 GL 3 

4 Definition of ASP-WFS Via L 3 

We first explain how are we going to use L3 to define our semantics. Stable 
semantics is given in terms of ->g 3 and — +g 3 4 , but hr order to define our L3-WFS 
semantics we propose an extra connective ~<g' 3 ■ It is important to note that these 
connectives are abbreviation forms using the standard language of L3. 

In table 4 you can see the correspondence between the abbreviations and the 
valuation for the connective The reader can easily check the correspon- 

dence of truth values of a — >g 3 b and its abbreviation form in the language of t 3 . 

Then we have two different ‘negations’ and then two different logics, namely 
the original G 3 (with its standard connectives ->g 3 and — »g 3 ) and G 3 (with the 
connectives -1 g’ 3 and — >g 3 )- 

Definition 1. Let P be a logical program and M be a set of atoms. M is a 
G 3 -stable model of P if /\ G , (P U ~<g' 3 M) is consistent and P\J->g> 3 M \=g' 3 M. 

(in the sense that \=g 3 A g' (-P U -, g!,-V) — >g 3 Ag' ^ )• 

4 Note that Ag 3 corresponds to Al and Vg 3 corresponds to Vl. 
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Table 1. Abbreviation Forms in L 3 
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Definition 2. Let P be a logical program based on basic formulas and M be a 
set of atoms. M is a L 3 -WFS model of P if M is a G' 3 -stable model of P. 

The following is a well known result (see [16, 17, 18]), which characterize stable 
models for propositional theories in terms of G 3: 

Theorem 1 . Let P be ajogical program and M be a set of atoms. M is a stable 
model of P iff P U —<g 3 M U —ig 3 — , g 3 M lbG 3 M- 

But if we reduce the class of programs to disjunctive programs the definition 
becomes (see [18]): 

Theorem 2. Let P be a disjunctive program and M be a set of atoms. M is a 
stable model of P iff PU ~^g 3 M ILg 3 M. 

Which is analogous to the previous definition using G' 3 . Since G3 and G 3 can 
be expressed in terms of L3 using the abbreviations in table 4 then we have two 
different semantics: stable semantics and L3-WFS respectively. 



5 Results 

We first show that L 3 -WFS is different to some well known semantics and then 
present its characterization using L3 and we finish this section presenting a brief 
comment on well behaved semantics. 

5.1 Comparing L 3 -WFS with Other Semantics 

Consider the EWFS semantics, the CUT rule and the following example all of 
them taken from [6]: 



->a — > a, -<x A a — > 6, ~^b — » y, ~^y — » z 

Here EWFS(P) = {a, b, ~^x}, however EWFS(P U {6}) = {a, b, z, -ix, ~>y}. 
This example shows that EWFS does not satisfies CUT. L 3 -WFS(P) = 
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{a, b, z, ->x, ->y} as well as t3-WFS(P U { b }) = {a, b, z, ~>x, ->y}. Hence L3-WFS 
is different to EWFS. 

Consider the following two program examples taken from [6]: 

~^b — > p,c — > b, (p A ->a) — * c, ~^b — > a 

and 

->6 — > p, (p A ->a) — > 6, ^6 — > a 

One may expect the same semantics of both programs w.r.t. the common 
language. However, GWFS infers p in the first program, but it does not in the 
second program. L3-WFS gives the same answer in both programs which consists 
in deriving only —>c in both programs. Hence, L3-WFS is different to GWFS. 
Consider the following program example taken from [8]: 

~^b — > a, ->a — > b, ->a — * x, ~^b — * x 

Note that WFS+(P) = {}, but AS-WFS(P) = {x}. 

In [8] we have that WFS + is an stronger extension of WFS, then comparing 
the semantics we proposed and the results obtained in [6], we have that WFS < 
I/3-WFS< STABLE. This can be formalized as follows: 

Lemma 1. Let P be a normal program, then WFS(P) < L 3 -WFS(P) < STA- 
BLE (P). 

5.2 Well Behaved Semantics 

We conjecture that L3-WFS satisfies all principles involved in the definition of 
a well behaved semantics as long as we reject to interpret P U M as P M . Note 
that the notion P M is a syntactic transformation, not required when P U M 
has a logical meaning. Take for instance, the following program P from [7]: 
b — > a, ->a — > b. This example is used to show that WFS + does not satisfies 
the Extended Cut principle. While WFS + (P) = {a, ~^b}, WFS + (P U {— '&}) = 
{ — >a, — 16}. Moreover, { — >0., — ^6} is neither a 2-valued model, nor a 3-valued model 
of the program P U {^6}. However, this happens because WFS + does not have 
a “logical” definition for the semantics of programs extended with constraints 
(negated formulas). Hence, Dix interprets P U {^6} as P^ b \ pl^ b l := b — > a. 
Now {-ia, ^6} is a model of P U L3-WFS has a definition for semantics of 
any basic propositional theory that allows the use of constraints. In this example 
we get L 3 -WFS(P) = ASP-WFS(P U b }) = {a, ~^b}. Hence, we propose to 
reconsider Dix’s work on well-behaved semantics, towards a direction of making 
it more general and logical based. 

6 Related Work (Other Modal and Many Valued Logic 
Characterizations) 

In [15] we presented some results about characterizations of stable models and 
extensions of WFS, such characterizations in terms of the modal S4 and the 
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four-valued bilattice FOUR as well as Gelfond characterization of stable models 
in S5 are analogous to the characterization in G' 3 . 

6.1 Via S5 

Consider modal logic S5 with its standard connectives that we will denote as: 
~, — >, V and A. McDermott and Doyle introduced a non-monotonic version of 
S5. They define the X-expansions £ of a theory T as those sets satisfying the 
equation: 

£=C nx (TU{~ Q^£} (1) 

where C nx is the inference operation of the modal logic X. Depending on the 
approach, an arbitrary selected X-expansion for T or the intersection of all X- 
expansions for T is considered as a set of nonmonotonic consequences of T . 
McDermott proved that S5 coincides with its non-monotonic version, hence it is 
not very interesting. However, he considered formulas to complete the theory. If 
we only consider adding simple formulas of the form ~ da (a an atom), the story 
changes. In fact, Gelfond [10] was able to characterize stable models of stratified 
normal programs using this idea and the following translation: A normal clause: 

Ai A ... A A m A ~>A m+ 1 A ... A -<A n — + A 0 (2) 

becomes 

A\ A ... A A m A ~ DA m + 1 A ...A ~ OA n — > Aq (3) 



6.2 Via S4 

Now consider modal logic S4 with its standard connectives. Let ~<a be the ab- 
breviation form of the modal formula ~ Da. Gelfond in [10] gives a definition of 
similar semantics to AS-WFS, but it covers only the class of stratified programs, 
we generalized this concept in the following definition: 

Definition 3. Let P be a theory based on basic formulaand M be a set of atoms. 
We define M to be an AS-WFS model of P iff P U ~>M II — 34 M. We denote the 
sceptical semantics of P as AS-WFS(P). 

Hence, this definition opens the research line of defining other WFS exten- 
sions via different modal logics. For example, modal logic K behaves ‘closer’ 
(but still different) to the stable semantics. Consider the following example: 
->a — > b, ~^b — » a, ->p — » a, ->p — > p. Then AS-WFS has two models, namely 
{a,p},{b,p}. But using modal logic K we obtain no models. 

Lemma 2 . Let P be a theory based on basic formula and M be a set of atoms. 
Then M is an AS- WFS model of P iff PU ~>M II— S 4 M ■ 

Is well known that STABLE and WFS agree in the class of normal stratified 
programs. Hence, we have the following result. 

Corollary 1 . If P is a stratified normal program, WFS(P)= AS-WFS(P)= 
STABLE(P). 
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Table 3. FOURss-valuation 
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Table 4. Abbreviation forms in FOUR 
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6.3 Via FOUR 

The logical role that the four-valued structure has among Ginsberg’s well known 
bilattices is similar to the role that the two-valued algebras has among Boolean 
algebras. Four valued semantics is a very suitable setting for computerized rea- 
soning according to Belnap and in fact the original motivation of Ginsberg for 
introducing bilattices was to provide a uniform approach for a diversity of appli- 
cations in AI. Bilattices were further investigated by Fitting, who showed that 
they are useful also for providing semantics to logic programs, hence our interest 
is focus on relating FOUR with ASP-WFS. 

The FOUR- Valuation Bilattice. Belnap introduced a logic for dealing in 
an useful way with inconsistent and incomplete information. This logic is based 
on a structure called FOUR, see [3]. This structure has four truth values, the 
classical t and /, and two new T that intuitively denotes lack of information 
(no knowledge), and _L that indicates inconsistency ( “over” -knowledge) . These 
values have two different natural orderings. Measuring the truth: The minimal 
element is /, the maximal element is t and values T and _L are incomparable. 
Reflecting differences in the amount of knowledge or information: The minimal 
element is _L, the maximal element is T and values / and t are incomparable. 
We read the bilattice FOUR identifying _L as 0, T as 3, / as 1 and t as 2. We 
define the valuation of operators — > and □ as in table 4. It is important to 

note that these connectives are abbreviation forms using the standard language 
of FOUR as it is shown in table 6.3. 

As before, we define our main negation operator (->): -<p as ~ Dp. 

Using the reading of FOUR and the valuation defined in the table 4 tautolo- 
gies will be the formulas whose truth value is 3. Examples of some tautologies 
are: (->a — » a) — » a, a V ~<a, —<-< a — > a, a — * a). Note that a — » ->->a is not a 
tautology. 
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Theorem 3. Let P be a theory based on basic formula and M be a set of atoms. 
M is an AS-WFS model of P iff P U —>M IhpouR M . 

6.4 Models in L 3 and FOUR 

In order to compare L 3 -WFS and AS-WFS semantics it is necessary to study 
models in terms of G 3 and FOURsb- Here we have some preliminary results: 

Theorem 4. Let P be a theory based on basic formula and M be a set of atoms. 
M is an G 3 -WFS model of P then exists M' an FOURss model of P. 

The proof is by induction over the length of formula. 

Corollary 2 . If a is a tautology in FOURs 5 then a is a tautology in G 3 . 



7 Conclusions 

There is still actual interest in extensions of WFS ([5,8]). We need however 
to find a logical framework to define such extensions if one really believes in a 
logic-based program development approach. We propose an approach to define 
extensions of the WFS semantics based on completions with the same spirit of 
STABLE, hence closing the gap between both approaches. As a result, we gain 
a better understanding of those semantics as well as the relation among them. 

We have that L 3 -WFS is sound with respect to stable models semantics and 
it can be used to approximate stable entailment. Still it is left work to do with 
respect to this semantics and our future work is to continue going deep in this 
semantics to see what properties of the well-behaved semantics it satisfies. 
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Abstract. We develop some ideas in order to obtain a nonmonotonic 
reasoning system based on the modal logic S4. As a consequence we show 
how to express the well known answer set semantics using a restricted 
fragment of modal formulas. Moreover, by considering the full set of 
modal formulas, we obtain an interesting generalization of answer sets 
for logic programs with modal connectives. We also depict, by the use of 
examples, possible applications of this inference system. 

It is also possible to replace the modal logic S4 with any other modal 
logic to obtain similar nonmonotonic systems. We even consider the use 
of multimodal logics in order to model the knowledge and beliefs of agents 
in a scenario where their ability to reason about each other’s knowledge is 
relevant. Our results clearly state interesting links between answer sets, 
modal logics and multi-agent systems. 



1 Introduction 

The stable model semantics, since its introduction in 1988 by Gelfond and Lif- 
schitz [1], has been recognized as a novel contribution to the communities of 
nonmonotonic reasoning and logic programming. Research groups in several in- 
stitutions are developing theory and applications related to this semantics. 

This logic programming paradigm evolved into what is known today as An- 
swer Set Programming (ASP). The basic idea of ASP is to provide a formal 
system that assigns to each logic program, the description of a problem, some 
set of desirable models, hopefully the problem solutions, called answer sets. The 
development efficient implementations of answer sets finders also allowed the 
creation of several applications that range from planning, solving combinatorial 
problems, verification, logical agent systems and product configuration. 

Modal logic originated, on the other hand, as a consequence of the study 
of notions such as “necessary” and “possible”. Extending the syntax of logic 
formulas with new unary connectives □ and 0, and giving an adequate semantical 
meaning to them, it is possible to define formal systems to model knowledge, 
tense and obligation. Modal formulas usually have a very natural reading close 
to their intended intuitive meaning. This is one of the reasons why they had 
been quite useful to provide foundations for several applications in knowledge 
representation, multi-agent systems, etc. 
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In this paper we use the modal logic S4 in a general framework to model 
nonmonotonic reasoning. This is possible due to a characterization of answer 
sets in terms of intuitionistic logic that we have recently provided. This result, 
as well as several properties and consequences, are presented in [7-9]. Some other 
ideas relating modal logics and logic programs are also discussed in [6]. 

Moreover, we prove that the ASP approach is embedded in our proposed S4 
nonmonotonic semantics. This is not really a surprise since, by virtue of the 
characterization of answer sets, intuitionistic logic can be embedded into modal 
logic S4 (thanks to a well known Godel’s translation) . 

We propose the following interpretation for our system: Consider a logic agent 
with some modal theory as its base knowledge. The agent could use the logic 
S4, a logic of knowledge, in order to do inference and produce new knowledge. 
However, we would also like our agent to be able to do nonmonotonic reasoning. 

Informally speaking we will allow our agent to suppose some simple acceptable 
knowledge in order to make more inference. This simple acceptable knowledge 
consists of formulas of the form DOA, where F is a formula containing only 
unary connectives. Such formula could be read as: “the agent knows that the 
fact F is possible”. If the agent can to justify all his assumptions (i.e. to prove 
all these formulas F ) and obtain some sort of complete explanation for his base 
knowledge then we say it is safe for him to believe this new information. 

The fact that nonmonotonic reasoning can be done via S4 has been already 
shown in [13]. Our approach is different and follows a line of research, originally 
proposed by Pearce [10], that tries to find relations among intuitionistic, modal 
logics and answer sets. Most of the results by Pearce were developed for disjunc- 
tive logic programs and, as one of the contribution of this paper, we present now 
some generalizations for propositional theories. 

Our paper is structured as follows: In Section 2 we briefly introduce the 
syntax of propositional modal logic and the basics of S4. In Section 3 we present 
our framework for doing nonmonotonic reasoning using the logic S4. In Section 4 
we introduce a previous result that relates answer sets with intuitionistic logic 
and, in Section 5, we establish the relations found with respect to the logic S4. 
Finally we present in Section 6 some ideas on how to develop an ASP approach 
for multi-agent systems formulating an example with two different agents. We 
finish with some conclusions in Section 7. 



2 Background 

In this section we briefly introduce some basic concepts and definitions that will 
be used along this paper. We introduce the language of propositional modal logic 
and the proof theory of the modal logic S4. 



2.1 Propositional Modal Logic 

We use the set of propositional modal formulas in order to describe rules and 
information within logic programs. Formally we consider a language built from 
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an alphabet containing: a denumerable set C of elements called atoms or atomic 
formulas', the binary connectives A, V and — > to denote conjunction, disjunction 
and implication respectively; the unary connective □ as a knowledge operator', 
the 0-ary connective _L to denote falsity; and auxiliary symbols (, ). 

Formulas can be constructed as usual in logic. The negation ->F can be 
introduced as an abbreviation of the formula F — > _L, the belief operator 0 F to 
abbreviate -O-i F and, similarly, the truth symbol T that stands for -i_L. We 
also can write, as usual, F<->G to denote the formula (F— >G) A(G— > F). Finally, 
the formula G <— F is just another way of writing F — > G. 

A modal theory, or modal program, is a set of modal formulas, we restrict our 
attention however to finite theories. For a given theory T its signature, denoted 
Ct, is the set of atoms that occur in the theory T. Observe that, since we consider 
finite theories, their signatures are also finite. Given a theory T we also define 
the negated set ~>T = {-1 F \ F € T} and the knowledge set CUT = {OF \ F € T}. 

A literal is either a formula of the form a (positive literal) or ->a (negative 
literal) where a is an atom. Given a theory T we use Litr = Ct U ~^Ct to denote 
the set of all literals that are relevant to T . If, for instance, we have the theory 
T = {-id — > b} then Litr = {a, _, a, b, -> b }. 

2.2 Modal Logic S4 

Modal logic was originally conceived as the logic of necessary and possible. The 
logic S4 can be defined as the Hilbert type proof system that contains the fol- 
lowing axiom schemes: 

1. (F -> (G -► H)) -► ((F —> G) —> (F —> H)) 4. D(F — > G) — > (OF -► DG) 

2. (F -> (G -> F) 5 .OF-* DOF 

3 . ->-.F->F 6.DF-+F 

and is closed under the rule of Necessitation (from F we can derive GIF) as well 
as Modus Ponens (from F and F — ■> G we can derive G). The behavior of other 
connectives follows from their usual definition in classical logic. 

We use the standard notation b F to denote that F is a provable formula 
in the logic S4. If T is a theory we understand the symbol T b F to mean that 
b F\ A • • • A F n — > F for some Fj contained in T. Similarly, given a theory U, we 
use the symbol T b U to denote T b F for every F £ U. A theory T is said to 
be consistent, with respect to the logic S4, if it is not the case that Tbl. We 
use the notation T lb U to stand for the phrase: T is consistent and Tbl/. 

3 Non-monotonic Reasoning in S4 

In this section, we introduce our proposed nonmonotonic inference system using 
S4, in particular the notion of a weakly complete and consistent extension for 
a given theory. The idea of our approach is as follows: “If we cannot derive a 
formula F by a standard inference in Sf from a theory T, we could try to derive 
the formula F by a ‘suitable ’ extension of the theory T. ” 
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The first basic requirement of this extended theory is to be consistent. Sec- 
ond, the extra formulas that we include should be a sort of ‘weak’ assumptions. 
Any formula of the form □Of (where F is any formula containing only 1-place 
connectives) is considered a ‘weak’ assumption. Such formula just says that it is 
known that is possible something. As noted before, we borrow such formula if it 
helps us to obtain a consistent explanation of the world. 

Definition 1. Let P be any modal theory and let M C Litp. The modal closure 
of M is defined as M = ->(Litp\M) UDM. Then DM is an S4-answer set of the 
theory P if PUD()M II— S 4 DM. Moreover PUOQM is called a weakly complete 
and consistent extension of P. 

Consider the program P = □(-■□a — > b) with Litp = {a, ->a, 6, ^6}. Observe 
that {□&} is an S4-answer since PU □<> {-i-ia, ->a, -r->b, □&} is consistent and 
proves, under S4, the formula Ob. This program has no more answer sets. We 
could also consider the following more interesting example: 

1. Juan is mexican. 

2. Mary is american. 

3. Pablo is mexican and not catholic. 

4. It is known that normally mexicans are catholic. 

We can encode this problem using the following program: 

D(mexican(juan)). 

□ ( american ( mary ) ) . 

D(mexican(pablo) A ~^catholic(pablo)) . 

0((mexican(X ) A -t0^catholic(X)) — » catholic(X)) . 

Note that the sentence “It is known that normally mexicans are catholic” is 
encoded by the last rule, that says: It is known that ‘if it is not known that X 
is not catholic’ and ‘A is mexican’ then ‘A is catholic’. Of course, we also need 
(not just S4) our extended nonmonotonic S4 to make this example work as we 
immediately explain. The unique s4-answer set of this program is: 

□ {-i catholic(pablo) , mexican(pablo) , mexican(juan ), 
catholic(juan ) , american{mary ) } 

So, we know that Pablo is not catholic. We also know that Juan is catholic. 
However, this knowledge comes from nonmonotonic reasoning. This kind of 
knowledge, not derived by the regular inference procedure of S4, can be con- 
sidered as a “weak knowledge” or a “strong belief” so to speak. The S4-answer 
set does not decide whether Mary is catholic or not, since the program does not 
provide any clue about this fact. 

Why modal logic S4? The S4 and S5 systems are perhaps the two most 
well known systems to represent the notions of knowledge and possibility. The 
theorem that we present in the next section however fails if we use S5 instead of 
S4. The problem is that S5 has no “irreducible iterated modalities”. In S5, OF 
is equivalent to 0DP- We need, however, to distinguish these two formulas in 
order to gain expressibility. Many other logics, between S4 and S5, behave nicely 
with respect to our approach. 
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4 Expressing Answer Sets 

The answer set semantics is a popular semantic operator for logic programs. One 
of the main features of answer sets is the introduction of negation as failure which 
is extremely useful to model notions such as nonmonotonic reasoning, default 
knowledge and inertial rules. The definition of answer sets is not required for 
the purposes of this paper, the reader is referred to [8] for more details. It is 
just important to mention that answer sets are defined for augmented logic 
programs, a class of propositional logic programs (without modal connectives), 
and incorporates an additional classical negation connective, denoted ~, that 
can only be used preceding atomic occurrences. 

4.1 Logical Foundations of A-Prolog 

The characterization of answer sets in terms of intermediate logics is an im- 
portant result that provides solid logical foundations to this paradigm. Pearce 
initiated this line of research using intuitionistic extensions, obtained by adding 
negated atoms, to characterize answer sets for disjunctive logic programs [12]. 
This procedure, however, is not able to obtain the answer sets of logic programs 
containing negation in the head [8]. 

Alternatively Pearce developed another approach using extensions of theories 
based on the logic HT, and showed that they are equivalent to the equilibrium 
logic also formulated by himself [11,12], In a more recent paper [5], together 
with Lifschitz and Valverde, Pearce was able to show that equilibrium models 
can be used to obtain the answer sets of augmented logic programs. 

Following the original idea from Pearce we were able to show that a char- 
acterization of answer sets for augmented programs is also possible in terms of 
intuitionistic logic. We do consider extensions with negated atoms, as Pearce 
did, but also allow double negated atoms in our intuitionistic extensions. 

Pearce itself suggested how modal logics could be used to model answer sets 
for augmented programs as a consequence of his results on Nelson’s logic. Our 
contribution in this paper is to fill in the details and explore possible advantages 
of this approach. The following theorem was stated and proved in [8]. 

Theorem 1 . Let P be an augmented program and let M C Cp. M is an answer 
set of P if and only if P U ~^{Cp \ M) U Ihi M. 

Previous theorem assumes that augmented programs do not contain classical 
negation, recall that the role of classical negation is not defined in the context of 
intuitionistic logic. This assumption, however, does not impose any important 
restriction since classical negation can be easily simulated as will be explained in 
Section 4.2. On the other hand, this result suggests a natural way to extend the 
notion of answer sets for any propositional theory T. The idea was also proposed 
in [8] and used later to develop the safe belief semantics in [9] . 

Definition 2. Let P be any propositional theory and let M C Cp. We define 
M to be an answer set of P if and only if PU -'{Cp \ M) U -i—>M Ihi M . 
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We have also shown that the answer set semantics is invariant under any 
intermediate logic and generalized this approach to include extensions of the 
form ~^^F, with F any formula, providing a more general framework to study 
and define semantics, see [9]. These results are (in our point of view) good 
evidence of the well-behavior of the answer set semantics in logical terms. 

4.2 Restoring Classical Negation 

We will now show how to restore the use of classical negation in our logic pro- 
grams. Recall that ~ is only allowed preceding atomic formulas so that, intu- 
itively, we may think of the formula ~a just as an atom with a convenient name 
so that an answer set finder can discard models where both a and ~a appear. 

Formulas of the form a and ~a are referred as ~ literals, and we say that a set 
of ^literals M is consistent if it is not the case that both a and ~a are contained 
in M for some atom a. We will also use the terms asp-formulas, asp-theories 
and asp-programs to denote entities that allow the use of classical negation. 
We also define, if M is a set of atoms, ~M = {~a | a £ M}. Moreover, for an 
asp-program P its extended signature will be Cp = Cp U 

Definition 3. Given a signature C, let C! he another signature with the same 
cardinality as C and with C fl £ — 0. Also let f: C — > £ be a biyective function 
between the two signatures. We define the mapping + from asp-formulas over 
the signature C to formulas over C U £ recursively as follows: 

1 . (T)+ = ±. 

2. for any atom a let (a) + = a and (~a) + = /(a). 

3. for any pair of formulas F, G let (F©G) + = F + QG + where © € {A, V, — >}. 
The definition of + is also extended to theories as usual , T + = {T + | F £ Tj. 

Lemma 1 . Let P be an augmented asp-program and let M C Cp be a consistent 
set of ~ literals. Let + be a mapping defined with the corresponding set C' P and a 
function f : Cp — > C' P . M is an answer set of P if and only if M + is an answer 
set of P + . 

Proof. Follows as a direct generalization of Proposition 2 in [2] . 

Theorem 2. Let P be an augmented asp-program, and let M be a consistent set 
of ^literals. M is an answer set of P iff P + U ~<(Cp+ \ M + ) U Ihj M + . 

Proof. Follows immediately by Theorem 1 and previous lemma. 



5 Characterization of Answer Sets Using S4 

The purpose of the following translation, given in [3], is to provide a meaning 
to any propositional theory including classical negation with a broader role, and 
not only for the class of augmented programs. We can consider, for instance, the 
use of the classical negation connective for any arbitrary formula and not only 
for single atoms. We will see that propositional modal formulas are expressive 
enough to model the two kinds of negations in asp-programs. 
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Definition 4. The translation ° of asp-formulas to modal formulas is defined 
recursively as follows: 



(a)° = Da, for atomic a 
(F V G)° = F° V G° 

(F A G)° = F° A G° 

(F — > G)° = D(F° — > G°) 
(~'F)° = CHF° 

(~F)° = F* 



(a)* = CHa, for atomic a 
(F V G)* =F* AG* 

(F A G)* = F*VG* 

(F — > G)* = F° AG* 

(->F)* = F° 

(~F)* = F° 



F/ie definition is also extended to sets of asp-formulas, T° = {F° \ F £ T}. 



Observe that our translation behaves just like the Godel’s translation of in- 
tuitionistic logic into S4 for programs without classical negation. Due to the well 
known Godel embedding of intuitionistic logic in S4 and Theorem 2, we can 
express answer sets for augmented programs, where the role of classical negation 
is “passive” (because it is only applied to atoms). 

Theorem 3. Let P be an augmented asp-program, and let M be a consistent 
set of ~ literals. The set M is an answer set of the logic program P if and only 
if (P+ U -.(£ P + \ M+) U -vnM+)° 1 1 S 4 M + °. 

In the rest of this section we discuss an alternative representation of answer 
sets in order to model an active role of classical negation. 

Definition 5. Let C and C\ two disjoint signatures and let f a bijective function 
from C to C\ . We define a mapping — from modal formulas with signature £U£i 
to formulas in C recursively as follows: 

M • ) = - 

2. for any atom a let (a) = a if a £ C and (a) = -<f 1 (a) otherwise. 

3. for any pair of formulas F , G let (FqG)~ = F~QG~ where © £ {A,V,— »}. 
j. for any formida F let ( OF)~ = DF _ . 

We also extend our translation over theories: T~ = {F~ \ F £ T}. 

Lemma 2. If P is an augmented asp-program, defined over C, then P + ° = P° . 

Proof. Without lost of generality it suffices to consider a singleton program. 
Then, it suffices to apply a direct induction on the size of the formula. 

Theorem 4. Let P be an augmented asp-program, and let M be a consistent 
set of ~ literals. Then M is an answer set of P iff M° is a S4 -answer set of P° . 

Proof. Let C' P be a new signature with the same cardinality as Cp and such 
that Cp fl Cp = 0. Let /: Cp C' P be a bijective function. We have then that 
a consistent set of ~ literals M is an answer set of P: 
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iff P + U —>{Cp+ \ M + ) U -i-i M + I (— i M + , by Theorem 2. 
iff (P U \ M) U II— i M + , by the definition of the mapping +. 

iff (PUn(£p\M)UnnM) + ° Ibg 4 M + °, by Godel’s embedding of I into 
S4. 

iff (P U ~^(Cp \ M) U -i-i M) + ° Ibg 4 M + ° , by a proof by induction on the 
length of the S4 proof and since M is a consistent set of ^literals (the very 
restricted class of formulas added as an extension is also required) . 
iff (P U \ M) U * 11-1 M)° Ibg 4 M°, by Lemma 2. 

iff P° U □0( _, (Lztpo \ M*) U M°) Ibg 4 M°, by set theory and definition of o, 
where M* is obtained from M replacing ~ with -i. Note that M° = HUM*, 
also that (-IIS') 0 = and = DODS* for any set of ~literals 

S. 

iff M° is an s4-answer set of P°, by the definition of S4-answer sets. 

The following theorem is one of the main contributions of this paper. It 
shows, thanks to the invariance of the answer set semantics with respect to 
intermediate logics proved in [9], that also a wide family of modal logics can be 
used to characterize answer sets using the current approach. 

Theorem 5. Let P be an augmented asp-program, and let M be a consistent 
set of ~ literals. Then M is an answer set of P iff M° is a X- answer set of P° . 
Where X is any logic between S4 and Sf.3 inclusive. 

Proof. Follows from Theorem 4 and results in [9] . 

The transformations proposed and studied in [3] exhibit, in particular, an 
embedding of the logic of Nelson N into S4. Recall that the logic of Nelson 
already considers two kind of negations which correspond to the — and ~ we 
introduced here. We believe that, based on the results presented on [3], the proofs 
of theorems presented in this section could be simplified. It would also make the 
introduction of ~ less artificial. 



6 ASP for Multimodal Logic 

ASP for multimodal logic would require to generalize Definition 1. Instead we 
just introduce here an example and leave a formal presentation for a future 
paper. We will consider “The Wise Man Puzzle” to show how nonmonotonic 
reasoning can be used to model the knowledge and beliefs of agents where inter- 
action between them has an important role 1 . This puzzle is typically stated as 
follows [4]: 



1 We would like to emphasise that we are not trying to model the behavior or capa- 
bilities of agents, instead we just try to show how our nonmonotonic system could 
provide a convenient way to represent the knowledge of these agents and the way 
that their knowledge interacts. 




Answer Set Programming and S4 361 



A king wishes to determine which of his three wise men is the wisest. 

He arranges them in a circle so that they can see and hear each other 
and tells them that he will put a white or a black spot on each of their 
foreheads, and that at least one spot will be white. In fact all three are 
white. He offers his favor to the one who can tell him the color of his 
spot. After a while, the wisest announces that his spot is white. How does 
he Know? 

The solution is based, of course, on the ability of the agents involved to reason 
about knowledge and beliefs of other agents, information that can be observed 
and the rules stated by the king at the beginning of the game. We will model a 
shorter and simpler version in which only two wise man participate. We write u>i 
(m 2 ) to denote that the first (second) wise man has a white spot on his forehead. 
We can state a set of assumptions P as follows: 



□ i(mi V m 2 ), 


□ 2 (mi V m 2 ), 


(1) 


□ iD 2 (mi V m 2 ), □2Di(m 1 V m 2 ), 




□ i(mi -> □ 2 m 1 ), U 2 {w 2 — >■ Di m 2 ), 


(2) 


□ i(->mi — > □ 


2 -‘w 1 ),n 2 (->w 2 Di-^)* 




□2O2O1W1 — i 


• n 2 0iwi, 


(3) 


0 2 <> 2 <)i^wi - 


-> □ 2 0i _, m 1 , 





The group of rules under block (1) state the fact that each wise man knows the 
king’s announcement: “at least one spot will be white”, and they also know each 
other knows this information. The rules in block (2) state that they know that 
they can see each other. Finally the pair of rules under (3) are the nonmonotonic 
part of our program. Those rules will make the second wise man assume, if it 
makes sense, that the first wise man believes W\ (or -imi). 

Note that P \fs4 0 2 w 2 since, in principle, the second wise man can not know 
for sure, using only the information in P, the fact that he has a white spot. He 
cloud try, however, to do nonmonotonic inference extending his theory with sim- 
ple acceptable knowledge. Adding □ 2 <C > 2 < !)i _ '«h to P allows him, in fact, to prove 
that □ 2 m 2 . If we take the set M = 0 2 () 2 \A 2 w 2 , Oi«h, Oi^mi, Di^} as 

a set of simple acceptable knowledge, it turns out that P U M is consistent 
and proves the facts {E^mi, \A 2 w 2 , \3\w 2 }. A reasonable definition of ASP for 
multimodal logic should recognise this set as a valid answer set. 

As an important observation notice that we did not have to explicitly include 
W\ A m2 as a fact in the program P. This is interesting since agents are reasoning 
about the world without depending on the actual situation going on, and thus 
leaving a more general program. Suppose for instance that only the first wise 
man has a white spot (i.e. mi A ->m 2 ) then he could deduce, just after seeing 
his partner, Dimi. The model M discussed above is no longer consistent with 
P U {Dimi} and, therefore, it should not be answer set. 

This is exactly the notion of nonmonotonic reasoning we are trying to model. 
It is, in some sense, safe for the second wise man to assume he has a white 
spot. Until he has evidence to believe the opposite, that is when the first wise 
man makes explicit his knowledge about the situation of the world, then he 
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can provide an answer for sure. What we gain is the possibility to start making 
inference about the knowledge or beliefs of other agents without needing them 
to explicitly state such information. 

Theorem proving tools of the Logics Workbench LWB 2 , developed at the Uni- 
versity of Bern in Switzerland, were used to check the proofs in this section. 



7 Conclusions 

We propose how to do nonmonotonic reasoning using the modal logic S4. We 
also showed, generalizing a previous result in intuitionistic logic, how we can 
express the well known answer sets semantics using our approach. Observe that, 
in principle, it is possible to replace S4 with other stronger logics, up to S4.3, to 
get similar nonmonotonic systems. Interesting applications can also emerge if we 
allow the use of multimodal logic to model several interacting agents aimed with 
nonmonotonicity. Our results clearly state interesting links between ASP, modal 
and multi-modal systems, which might bring research of these areas together. 
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Abstract. The inductive approach [1] has been successfully used for 
verifying a number of security protocols, uncovering hidden assumptions. 
Yet it requires a high level of skill to use: a user must guide the proof pro- 
cess, selecting the tactic to be applied, inventing a key lemma, etc. This 
paper suggests that a proof planning approach [2] can provide automa- 
tion in the verification of security protocols using the inductive approach. 
Proof planning uses AI techniques to guide theorem provers. It has been 
successfully applied in formal methods to software development. Using 
inductive proof planning [3], we have written a method which takes ad- 
vantage of the differences in term structure introduced by rule induction, 
a chief inference rule in the inductive approach. Using difference match- 
ing [4] , our method first identifies the differences between a goal and the 
associated hypotheses. Then, using rippling [5], the method attempts to 
remove such differences. We have successfully conducted a number of 
experiments using HOL-Clam [6], a socket-based link that combines the 
HOL theorem prover [7] and the Clam proof planner [8]. While this pa- 
per key’s contribution centres around a new insight to structuring the 
proof of some security theorems, it also reports on the development of 
the inductive approach within the HOL system. 



1 Introduction 

The inductive approach [1] articulates the verification of security protocols in 
a formal setting. The inductive approach has been widely used for verifying 
protocols, but it requires a high level of skill to use. A user must guide the 
proof process: selecting the subgoaling strategies (called tactics ) to be applied, 
inventing key lemmas, etc. Not only are the proofs deep but also they are onerous 
and cumbersome. 

Proof planning is a meta-level reasoning technique, developed especially to 
automate theorem proving [2] . A proof plan expresses the patterns of reasoning 
shared by members of the same family of proofs, and is used to drive the search 
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for new proofs in that family. Proof planning works by using formalised pre- 
and post- conditions of tactics as the basis of plan search. These high-level 
specifications of tactics are called methods. Proof planning has been successfully 
tested in a number of domains, including software verification [9]. 

This paper suggests that a proof planning approach can provide automation 
in the verification of authentication protocols using Paulson’s approach. It re- 
ports on a few steps taken towards this aim. Using a proof plan for induction [3] , 
we have written a method which takes advantage of the differences in term struc- 
ture that are introduced by an application of rule induction. Being the reason for 
the name, rule induction is a chief inference rule in the inductive approach for 
the verification of security protocols. Our method first identifies the differences 
between a goal and its associated hypotheses, applying difference matching, and 
then removes them, applying rippling [5], core of the inductive proof plan. 

We have successfully tested our method on a number of security goals. We 
have conducted our experiments on HOL-Clam [6], a socket-based link that 
combines the HOL theorem prover [7] and the Clam proof planner [8]. While this 
paper key’s contribution centres around a new insight to structuring the proof 
of some security theorems, it also reports on the development of the inductive 
approach within the HOL system. 

The rest of this paper is organised as follows: Section 2 outlines the induc- 
tive approach to verification, emphasising key proof steps and major difficulties 
in proof discovery. Section 3 describes proof planning, the approach we have 
adopted to automate theorem proving. In particular, Section 3 describes rip- 
pling and difference matching, two techniques central to our method, which is 
described in Section 4. Section 5 indicates directions for further work and details 
drawn conclusions. 



2 The Inductive Approach: An Overview 

In the inductive approach, a protocol is circumscribed as the set of all possible 
traces that it can take. A protocol trace is a list of communicating events. There 
are four types of events : 

Says A B M , which means that agent A has sent message M to agent H; 

Gets A M , which means that agent A has received message M; 1 
Notes A M , which models that agent A has performed a computation over the 
content of message M and then stored the result; and 
Oops, which models that an agent has accidentally lost a piece of critical infor- 
mation, such as an session key. 

An attack is modelled directly by modifying the protocol specification, im- 
posing unfaithful participations of one or more agents. 

The model involves three sorts of agents: the server, S, the spy, Spy, and 
the friendly agents, A, B, . . .. The server is absolutely trusted. However, friendly 



Notice that the originator of the message is not taken for granted. 



l 
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agents may get compromised; in symbols, A £ bad. Only the spy may read 
messages not intended to himself. He holds the long-term key of all compromised 
agents, hence possibly posing as them, but may also send genuine messages using 
his own long-term key. 

2.1 Messages and Message Analysis 

Messages comprise agent names, S, Spy, A, B, . . ., fresh labels (called nonces ), 
N a , N b , . . ., shared keys, K as , K bs , . . public keys, K a ,K b ,..., private keys, 
R- 1 , K b , . . ., session keys, K ab , K a c, ■ • •, compound messages, {|X, F| }, and 
messages encrypted under key K , ||A'|}a'. By convention, an encrypted message 
can neither be read nor altered without the corresponding encryption key. 

The theory of messages involves three main operators. Each operator is de- 
fined on possibly infinite sets of messages and is used to model properties of 
a protocol and reason about its runs. Let H be a set of messages, then the 
operators are given as follows: 

parts H returns all the message components that can be recursively extracted 
from messages in H by projection and decryption; 
analz H is as parts H , except that decryption of traffic is performed only using 
available keys; and 

synth H models the messages that the spy can forge using only H and available 
keys. 



2.2 Events 

The theory of messages is at the bottom of a 3-layer, hierarchical model. Building 
on the theory of messages, the theory of events is at the next upper layer. The 
event theory aims at characterising the knowledge an agent may reach upon a run 
of a protocol. Additionally, it formalises a notion of message freshness, which is 
useful to separate past and present. There are three key relation symbols defined 
in events: 

1. initState A, which models the initial knowledge of agent A; 

2. knows A evs, which models what agent A can learn from a protocol trace, 
evs, considering A’s initial knowledge; and 

3. used evs, which allows one to determine whether or not an element is fresh 
with respect to protocol trace evs. 



2.3 Shared-Key and Public-Key Cryptography 

At the top of the hierarchy, depending on the kind of protocol under analysis, 
there is either of two theories: shared, for protocols involving slrarecl-key cryp- 
tography, and public, for protocols involving public-key and possibly shared-key 
cryptography. 

Similar to Schneider’s [10], Paulson’s approach considers security proper- 
ties from a high-level view. Thus, the identity of the spy is known and may 
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be used to express properties. For example, let evs denote a protocol run, then 
X £ analz (knows Spy evs ) holds only if X is in the traffic, or if it is part of a com- 
pound message or if it is in the body of a message encrypted under a key known 
to the spy. As another example property, consider X £ parts (knows Spy evs), 
which is used to specify that X is part of the traffic and so it has been issued, 
possibly inside a larger, encrypted message. 

The analysis of a protocol proceeds by rule induction. It involves the use 
of over 400 properties, either (in) equalities or implications. These properties 
refer only to the inductive approach. Subsidiary results, imported from ancestor 
theories, for example, lists, sets, arithmetic, and so on, are also required. 

A protocol is described as a collection of inference rules. Each inference rule 
has zero or more hypotheses (enclosed by square brackets, with semicolons sep- 
arating them) and one conclusion, joined by K The inference rules state the 
various forms in which a protocol trace can be possibly extended with new 
events. Fig. 2.3 deploys a partial description of the Otway-Rees protocol. There, 
# and set respectively stand for function list constructor and the function that 
takes a list and then converts it into a set. 



Rule Name Definition 

Nil [ ] b .[ ) £ otway 

Fake [ef £ otway; A £ synth (analz (knows Spy ef))] 
h Says Spy B X#ef £ otway 

Reception [er £ otway ; Says A B X £ set er] 
h Gets B X#er £ otway 

OR1 [el £ otway; Nonce NA ^ used el] 

h Says A B {| Nonce NA, Agent A, Agent B, 

{] Nonce NA, Agent A, Agent B\}k as |}#el £ otway 

OR2 [e2 £ otway; Nonce NB ^ used e2; 

Gets B {j Nonce NA, Agent A, Agent B, A|} £ set e2] 
h Says B S {j Nonce NA, Agent A, Agent B, X, 

{] Nonce NA, Nonce NB, Agent A, Agent B\}k bs |} • • • 

OR4 . . . 

Oops [ eo £ otway ; Says S B{\ Nonce NA, X, {[Nonce NB, Key K\}k bs |} € set eo] 
h Notes Spy {Nonce NA, Nonce NB, KeyK\}#eo £ otway 



Fig. 1 . (Partial) Inductive definition of the Otway-Rees protocol 



2.4 Isabelle 

The inductive approach has been implemented in Isabelle/HOL, the Higher- 
Order Logic instantiation of the Isabelle generic theorem prover [11]. Isabelle 
contains a large collection of assorted proof methods, including a simplifier and 
several classical reasoners. These methods are all powerful and they all accept 
a number of switches, with which one can extend their scope of application. 
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Isabelle also contains a variety of proof support tools, with which one can easily 
define new symbols, set a suitable, succinct syntax, and so on. 

Despite Isabelle’s powerfulness, using Isabelle to verify a security protocol 
demands a high level of skill. The proofs are so deep, onerous and so cumbersome 
that applying a proof method without using any switch is most likely not to be 
enough. Selecting the right switch to a proof method requires mastership on 
using Isabelle. For example, for blast 2 to work smoothly, a developer is required 
to carefully select which properties are to be used as elimination, introduction or 
destruction rules. She is also required to know the content of the classical rule sets 
in order to properly use them whenever necessary. In a similar vein, a developer 
is required to carefully select which properties are to be used for simplification. 
So, she constantly runs the risk of non-termination or having to work harder 
to find more proofs. Thus the inductive approach to protocol verification poses 
lots of challenges to full automation. This paper suggests that a proof planning 
approach can improve the existing level of automation in this domain, or at least 
extend it so as to take user-level interaction to a more adequate, convenient level. 



3 Proof Planning 

Proof planning approaches automatic theorem proving in two stages. One for 
building a proper proof plan to a given goal, and another for executing the plan 
to obtain a proof of the goal. 

3.1 Methods 

Methods are the building-blocks of proof planning. A method is a high-level 
description of a tactic, containing an input sequent, preconditions, output se- 
quents, and effects or postconditions. A method is said to be applicable if the 
current goal matches the method input sequent and the method preconditions 
hold. Preconditions specify properties of the input sequent, with which proof 
planning predicts if the tactic associated with the method is applicable without 
actually running it, and likewise for the postconditions. The result of a method 
application is a list of output sequents, possibly empty. 

The proof planner develops the plan by selecting a method applicable to 
the current goal. If no method is applicable, it will terminate, reporting failure. 
Otherwise, the proof planner will give consideration to the subgoals returned by 
the first applicable method. This process is applied recursively to each subgoal 
till no more methods are applicable, or, as in the normal case of success, all the 
leaves of the proof plan tree return an empty list of output sequents. 

Inductive proof planning is the application of proof planning to automating 
inductive theorem proving [3]. It involves the use of rippling [5], a heuristic that 
guides the search for a proof of inductive cases and supports the selection of ap- 
propriate induction schemata. Rippling guides the manipulation of the induction 



2 Blast is Isabelle’s main workhorse and one of the classical reasoners. 
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conclusion to enable the use of a hypothesis, called fertilization. The key idea 
behind rippling lies in the observation that an initial induction conclusion is a 
copy of one of the hypotheses, except for extra terms. By marking such differ- 
ences explicitly, rippling can attempt to place them at positions where they no 
longer preclude the conclusion and hypothesis from matching. Rippling applies 
a special kind of rewrite rules, called wave-rules, which manipulate the differ- 
ences between two terms (wave- fronts) , while keeping their common structure 
( skeleton ) intact. 

Annotated terms, called wave-terms, e.g. evffevs , are composed of a wave- 



front, and one or more wave-holes. Wave-fronts, e.g. evff , are expressions that 
appear in the induction conclusion but not in the induction hypothesis. Inversely, 
wave-holes, e.g. evs, are expressions that appear in wave-terms and also in the 
induction hypothesis. 



3.2 Difference Unification 

Wave-annotations are introduced by applying difference unification [4] to the 
hypotheses and the conclusion. Difference unification extends unification so that 
differential structures between the terms to be unified can also be hidden, while 
computing a substitution. 

Ground difference unification is as difference unification except that it is re- 
stricted only to ground terms and is used to automate the dynamic generation of 
wave-rules. Ground difference matching is one-way ground difference unification 
and is used to distinguish term structural differences with wave-annotations, in 
order to serve rippling. 

The inductive proof plan consists of 4 compound methods: i) base_case, ii) 
normalize, iii) generalise and iv) incLstrat. A compound method is a method 
that calls other, possibly compound, methods from its pre- or post-conditions. 



4 Difference Reduction in Protocol Verification 

We approach the problem of automatically verifying security goals by using two 
systems. One is Clam [8], a proof planner, and the other HOL [7], a theorem 
prover for Higher-Order logic. From a Clam’s perspective, our method consists 
of an straightforward application of the inductive proof plan. It makes use of the 
four inductive proof methods and adds a new one, called dif f erence_reduction. 
dif f erence_reduction is as step_case, except that it first attempts to an- 
notate the goal at hand and, if successful, removes them using the standard 
heuristic knowledge underlying step_case. dif f erence_reduction annotates 
a sequent by ground difference matching the goal against the hypotheses. 
It takes advantage of the differences in term structure introduced by rule 
induction. 

From a HOL’s perspective, our method is just a powerful tactic, called 
ONCE_CLAM_TAC. The user actually deals with protocol verification through HOL, 
using the implementation of the inductive approach that we have already de- 
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veloped. 3 When facing a conjecture, the user simply calls 0NCE_CLAM_TAC via 
HOL-Clam [6], a socket-based link that combines the HOL theorem prover and 
the Clam proof planner. In the normal case of success, 0NCE_CLAM_TAC executes 
the tactic returned by Clam, which, furthermore, proves the goal or makes sub- 
stantial progress towards proving it. 

We illustrate our approach using a running example. Suppose that we want to 
prove that for Otway-Rees, see Fig. 2.3, an agent’s long term key, Key (shrK A ), 
is on the network traffic if and only if that agent is compromised; in symbols: 

evs £ otway — + (VA Key (shrK A) £ parts (knows Spy evs) = A £ bad ) (1) 

A proof of (1) proceeds by rule induction, an application of which yields 8 
subgoals. The goal related to the Fake protocol rule is the following: 4 



[evsf£ otway ; Key (shrK A) £ parts (knows Spy evsf) = A £ bad ; 
A £ synth (analz (knows Spy evsf))] 



b \/B. Key (shrK A) £ parts (knows Spy (Says Spy BX)#evsf) = A £ bad 



Notice the similarity between this goal and the Fake inference rule of the 
protocol (see Fig. 2.3). A simple ripple proof, applies the rules below: 5 



knows Spy ( (Says A B X)4kevs 


) = 
) = 


X :: knows Spy evs 


parts (A :: H_ 


parts {A} U parts H (2) 


x £ sUt 


]- 


x £ s V x £ t 



where :: denotes the set constructor function. Notice that including (2) in any 
term-rewriting system inevitably yields non-termination. 

Fertilisation is applicable to the current goal; its application leaves us with a 
new goal of the form: 

[otway evsf; Key (shrK A) £ parts (knows Spy evsf) = A £ bad 
X £ synth (analz (knows Spy evsf))] 
b WB. (Key (shrK A) £ parts {X} V A £ bad ) — A £ bad 

A further simplification, transforms this goal into a new one: 

[otway evsf Key (shrK A) £ parts {A'}; X £ synth (analz (knows Spy evsf))] 
b Key (shrK A) £ parts (knows Spy evsf (3) 

3 To obtain a HOL implementation of the inductive approach, the reader is referred 
to http : //webdia. cem . itesm.mx/ac/raulm/pub/33337-A. 

4 Notice that, for the sake of simplicity, we have already included the annotations. 

5 We emphasise that wave-rules are generated dynamically, from the symbol defini- 
tional (in)equations, each of which may give rise to many different wave-rules. 
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Notice how rippling moves the wave-fronts outwards, up to the top of the 
term tree structure: the boxes are dominating the goal formula structure. When 
no further rippling is applicable, fertilisation can be often applied to simplify 
the goal. Since wave-rules must preserve the skeleton of the annotated goal, the 
search space induced by rippling is less than that induced by rewriting at the 
object-level. Thus, dif f erence_reduction, inhabiting step_case, decreases the 
proof search space. 

However, our experiments show that in order to establish most security goals, 
often we need to iterate several applications of dif f erence_reduction. This is 
because, a single application of this method outputs a goal that is either trivially 
established, via base_case for example, or that has little structure for the ripple. 
Equation (3) is an example goal with no structure to be exploited by ripple. In 
these cases, one has to manually link the goals to enable further ripple. 

Fortunately, linking hypotheses to enable proof discovery is a task that can 
be easily automated, by fully saturating the hypotheses, but at the expense of 
increasing the search space. We are currently working on a method to prescrip- 
tively drive the normalisation of a goal to an intended stage, thus keeping the 
search space moderately small. 

Going back to our running example, since parts is monotonic with respect 
to subset, it is easy to transform the hypothesis list as follows: 

[otway evsf Key (shrK A) £ parts {A'}; X £ synth (analz (knows Spy evsf)); 

Key (shrK A) £ parts (synth (analz (knows Spy evsf)))] 
b Key (shrK A) £ parts (knows Spy evsf) 

This goal can be further rippled, only that the ripple would have to be per- 
formed on the hypotheses. In these cases, we use the contrapositive of an impli- 
cation: 

P -» Q b -Q -V nP (4) 

before annotating the goal. Using this simple maneuver, we have avoided the 
need of writing new HOL tactics that could do the ripple on the hypotheses. 
Thus, applying (4) and introducing the annotations, we have to prove that: 

[otway evsf . . . ; ^(Key (shrK A) £ parts (knows Spy evsf))] 

I — '(Key (shrK A) £ parts ( synth (analz (knows Spy evsf)) ) 

In this case, rippling applies the rules below: 




parts H U synth H 



\ X £ s V X £ t \ 



parts ( analz If ) 



\AVB\ 



parts H 




This leaves a subgoal that it is trivially established, using the additional 
results: 
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Key K £ synth H = Key K £ H 
X £ analz H — » X £ parts H 

Table 1 shows a few security goals, all related to the Otway-Rees protocol, 
in which diff erence_reduction was used to drive the search for a proof. Space 
constraints prevent us from presenting a complete proof. Yet, we have found out 
that the above proof pattern appears often in a number of proofs of security 
goals. The full test set, including HOL, HOL-Clam, Clam and the methods for 
protocol verification, are available upon request, by sending electronic-mail to 
the first author. 

Proof planning offers a number of techniques to automate theorem proving. 
While rippling is at the heart of the inductive proof plan, there are other tech- 
niques that are worth mentioning. For instance, the base_case method makes 
use of an recursive-path ordering (RPO) in order to fully automatically extract 
rewrite rules from symbol definitions. This RPO term-rewriting system is not 
proof equivalent to Isabelle’s simplifier. This is due to the fact that base_case 
includes rewrite rules obtained from inequalities and implications. By compari- 
son, within Isabelle, implication properties are normally used for elimination or 
destruction, never for simplification. 

Our method can partially drive the search for a proof of a security goal. 
By contrast, the Isabelle blast method is able to proof a number of goals with- 
out interaction. This is because it saturates the hypotheses in a way that most 
of their logical consequences are pulled out. When an single application of blast 



Table 1. Otway-Rees security goals solved using difference reduction 

No. Property 

1 evs £ otway — > Gets B X £ set evs — * 3 A. Says A B X £ set evs 

2 evs £ otway — > -i(A £ bad) — ► 

Crypt (shrK A)({|Af 4 , Agent A, Agent B|}) £ parts (knows Spy evs) — > 

-"(Crypt (shrK ^4) ( {| , Na, Agent A ' , Agent T|}) £ parts (knows Spy evs)) 

3 evs £ otway — > ->(Y £ bad ) — > 

Crypt (shrK A)({|Abi, Agent A, Agent B|}) £ parts (knows Spy evs) —* 

Says A B 

flAbt, Agent A, Agent B , {\N A , Agent A, Agent -B|}.fs: as |} £ set evs 

4 -i [A £ bad ) — > otway evs — > 

Crypt (shrK A)({|ACi, Agent A, Agent B|}) £ parts (knows Spy evs) —> 

Crypt (shrK .4 )({|./Va, Agent A, Agent C|}) £ parts (knows Spy evs) — > 

(B = C) 

5 -"(A £ bad ) — > evs £ otway — > 

Says A BQNa, Agent A, Agent B, {|IVa, Agent A, Agent B\}k as |} £ set evs — > 
{|1Va, (Key A')|} £ parts (knows Spy evs ) —> 

(3 Nb-', Says S B 

QNa^Na, (Key K)\}k as JY b , (Key A')|}x BS |} £ set evs) 
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fails, however, one needs to be an expert to find the required switch or identify 
the property that is still missing for a proof to be found. 

5 Further Work and Conclusions 

Further work involves searching for an algorithm which automatically and pre- 
scriptively could identify how to link several hypotheses to allow more ripple. 
This would enable us to proof simple security goals completely. Further work 
also involves formulating a proof method to automatically guide the application 
of rule induction. This method would play a key role in a proof plan for rule 
induction. 

Our results are encouraging. We believe proof planning can help structuring 
proofs of security protocols so as to reduce both the human skill levels and the 
development time required to verify an security protocol. Moreover, we believe 
proof plan may help understand how to use failure in the search for a proof so 
as to either suggest high-level, intelligible changes to the structure of a faulty 
protocol, or synthesise an attack. 

References 

1. Paulson, L.C.: The Inductive Approach to Verifying Cryptographic Protocols. 
Journal of Computer Security 6 (1998) 85-128 

2. Bundy, A.: The Use of Explicit Plans to Guide Inductive Proofs. In Lusk, R., 
Overbeek, R., eds.: Proceedings of the 9th Conference on Automated Deduction. 
Lecture Notes in Computer Science, Vol. 310, Argonne, Illinois, USA, Springer- 
Verlag (1988) 111 -120x 

3. Bundy, A., van Harmelen, F., Hesketh, J., Smaill, A.: Experiments with proof 
plans for induction. Journal of Automated Reasoning 7 (1991) 303-324 

4. Basin, D., Walsh, T.: Difference unification. In Bajcsy, R., ed.: Proceedings of the 
13tli International Joint Conference on Artificial Intelligence, IJCAI‘93. Volume 1., 
San Mateo, CA, Morgan Kaufmann (1993) 116-22 

5. Bundy, A., Stevens, A., van Harmelen, F., Ireland, A., Smaill, A.: Rippling: A 
heuristic for guiding inductive proofs. Artificial Intelligence 62 (1993) 185-253 

6. Boulton, R., Slind, K., Bundy, A., Gordon, M.: An interface between CLAM 
and HOL. In Grundy, J., Newey, M., eds.: 11th International Conference on Theo- 
rem Proving in Higher-Order Logics (TPHOLs’98), Camberra, Australia, Springer- 
Verlag (1998) 87-104 Lecture Notes in Computer Science, Vol. 1479. 

7. Gordon, M.: HOL: A proof generating system for higher-order logic. In Birtwistle, 
G., Subrahmanyam, P.A., eds.: VLSI Specification, Verification and Synthesis, 
Kluwer (1988) 

8. Bundy, A., van Harmelen, F., Horn, C., Smaill, A.: The Oyster-Clam system. In 
Stickel, M.E., ed.: Proceedings of the 10th International Conference on Automated 
Deduction. Lecture Notes in Artificial Intelligence, Vol. 449, Springer- Verlag (1990) 
647-648 

9. Monroy, R., Bundy, A., Green, I.: Planning Proofs of Equations in CCS. Auto- 
mated Software Engineering 7 (2000) 263-304 




374 J.C. Lopez and R. Monroy 



10. Schneider, S.: Modelling Security Properties with CSP. Computer Science De- 
partment, Technical Report Series CSD-TR-96-04, Royal Holloway, University of 
London (1996) 

11. Paulson, L.C.: Isabelle: the next 700 theorem provers. In Odifreddi, P., ed.: Logic 
and Computer Science, Academic Press (1990) 77-90 




On Some Differences Between Semantics of 
Logic Program Updates 



Joao Alexandre Leite 

CENTRIA, New University of Lisbon, Portugal 



Abstract. Since the introduction of logic program updates based on causal rejec- 
tion of rules, several different semantics were set forth. All these semantics were 
introduced with the same underlying motivation i.e., to overcome the drawbacks 
of interpretation based updates by applying the principle of inertia to program 
rules, but they were all defined for different classes of logic programs thus making 
their comparisons difficult. In this paper we redefine such existing semantics, and 
set forth a new one, all in the more general setting of Generalized Logic Programs, 
in a way that facilitates their comparisons. Subsequently, we take a closer look at 
the subtle differences between these otherwise similar approaches. 



1 Introduction 

Concerning modifications to a knowledge base represented by a propositional theory, two 
abstract frameworks have been distinguished in [16] and [7], One, theory revision, deals 
with incorporating new knowledge about a static world whose previous representation 
was incomplete or incorrect. The other deals with changing worlds, and is known as 
theory update. This paper addresses the issue of updates of logic programs, a subject 
that was recently put into context in [5], 

Until recently, the work devoted to this issue followed the so called interpreta- 
tion update approach based on the idea of reducing the problem of finding the update 
of a knowledge base by another knowledge base to the problem of finding the up- 
dates of its individual models. In [13] the authors introduce the framework of Revision 
Programming 1 where they allow the specification of updates of interpretations. 

In [9], it is pointed out that such approach suffers from important drawbacks when 
the initial knowledge base is not purely extensional (it contains rules), and propose a new 
approach where the principle of inertia is applied to the rules of the initial knowledge base, 
rather than to its model literals. This led to the paradigm of Dynamic Logic Programming. 
A Dynamic Logic Program (DLP) is a sequence of Logic Programs where each represents 
a time period (state) and contains some knowledge that is supposed to be true at the state. 
The mutual relationships existing between different states (specified as the ordering 
relation) are then used to determine its declarative semantics. 

Since the introduction of this form of updates of logic programs, several different 
semantics were set forth [9, 10, 2, 4, 1, 8], with the motivation of overcoming the draw- 
backs of interpretation based updates by applying the principle of inertia to program 



1 Despite the name, the authors are actually dealing with updates and not revisions. 

C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 375-385, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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rules. But they were all defined for different classes of logic programs, thus making their 
comparisons difficult. For example, [9, 10] define the semantics of Justified Updates for 
sequences of Revision Programs (a variant of Logic Programs), [2, 8] define the seman- 
tics of Stable Models for sequences of Generalized ( Extended ) Logic Programs ( GLP ) 
(Logic Programs allowing both default and strong negation in the heads of rules), and [4] 
define the semantics of Update Answer Sets for sequences of Extended Logic Programs. 
In [4], bridges between these semantics are established for restricted classes of logic 
programs, showing that under some restrictions the semantics coincide. 

In this paper we take a closer look at the differences between these similar approaches. 
For this, we start by establishing the definitions of six different semantics, all set forth in 
a similar manner, thus making their comparisons easier. Four of these six semantics, all 
defined for sequences of GLP 2 , either coincide or generalize the semantics mentioned 
before. The remaining two are new proposals. Subsequently, we compare all six seman- 
tics, either by means of examples or by means of properties, mostly with the intention 
of bringing out their differences rather than their similarities. 

The paper is structured as follows: in Sect. 2 we recall some definitions; in Sect 3 
we define the six mentioned semantics and relate them to the ones in the literature; in 
Sect 4 we establish some comparisons; in Sect 5 we wrap up. 



2 Preliminaries 

Let A be a set of propositional atoms. An objective literal is either an atom A or a strongly 
negated atom -<A. A default literal is an objective literal preceded by not. A literal is 
either an objective literal or a default literal. A rule r is an ordered pair H (r) <— B (r) 
where H (r) (dubbed the head of the rule) is a literal and B (r) (dubbed the body of the 
rule) is a finite set of literals. A rule with H ( r ) = Lq and B (r) = {L i, . . . , L n } will 
simply be written as L 0 <— L \ . . . . , L n . A tautology is a rule of the form L <— Body 
with L £ Body. A generalized logic program (GLP) P, in A, is a finite or infinite set of 
rules. A program is called an extended logic program (ELP) if no default literals appear 
in the heads of its rules. If H(r) = A (resp. H(r) = not A) then not H(r ) = not A 
(resp. not H{r) = A). If H ( r ) = ->A, then ~^H (r) = A. By the expanded gener- 
alized logic program corresponding to the GLP P, denoted by P, we mean the GLP 
obtained by augmenting P with a rule of the form not ~^H (r) <— B (r) for every rule, 
in P, of the form H (r) <— B (r), where H (r) is an objective literal. An interpre- 
tation M of A is a set of objective literals that is consistent i.e., M does not contain 
both A and -iA. An objective literal L is true in M, denoted by M 1= L, iff L £ M, 
and false otherwise. A default literal not L is true in M, denoted by M t= not L, iff 
L (j M, and false otherwise. A set of literals B is true in M, denoted by M 1= B, iff 
each literal in B is true in M. An interpretation M of A is an answer set of a GLP 
P iff M' = least (P U {not A \ A M}), where M' = M U {noLA \ A M}, A 
is an objective literal, and least(.) denotes the least model of the definite program 



2 For motivation on the need for GLPs in updates, the reader is referred to [2, 8]. Note that within 
the context of updates, GLPs are not equivalent to Logic Programs with Integrity Constraints. 
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obtained from the argument program by replacing every default literal not A by a new 
atom noLA. Let AS ( P ) denote the set of answer-sets of P. 

A dynamic logic program (DLP) is a sequence of generalized logic programs. Let 
P = (Pl, ..., P s ), V'= {Pi ..., i* ) and V"= ( P { ', ..., P") be DLPs. We use p (P) to 
denote the multiset of all rules appearing in the programs Pi, .... P s , and (V. V) to 
denote (Pi, ..., P s , P{, ..., P^j) and V U V" to denote (Pi U P", ..., P s U P"). 

3 Semantics of Updates 

In this Section we set forth the definitions of several semantics for dynamic logic pro- 
grams in a uniform manner. Subsequently we establish that some of these semantics 
either coincide or generalize the semantics proposed in the literature. Without loss of 
generality, we only consider the semantics at the last state of the DLP. The common 
aspect that relates the semantics for updates addressed here, known as those based on 
causal rejection of rules, is their relying on the notion that a newer rule that is in conflict 
with an older one may reject it to avoid contradiction. We start by defining the notion 
of conflicting rules as follows: two rules r and r' are conflicting, denoted by r 1X1 r' , 
iff H{r) = not H{r'). Note that we do not establish a pair of rules whose heads are 
the strong negation of one another as being conflicting. Intuitively, these rules should 
be regarded as conflicting. They are not explicitly stated as such since, by using the 
expanded versions of GLPs, we have that for every pair of rules r \ and r 2 in a DLP 
such that H {rfj = ~^H (r 2 ) there are two rules r[ and r 2 , introduced by the expansion 
operation, that are conflicting with the original ones i.e, ri M r' 2 and r\ IX r 2 . As will 
become clear below, this is enough to accomplish the desired rejection when a newer 
rule, whose head is the strongly negated atom of the rule being rejected, exists. 

Next we define three notions of rejected rules in a DLP. Intuitively, when we consider 
an interpretation, a rule that belongs to a program of a DLP should be rejected if there 
is a newer conflicting rule whose body is true in the interpretation. This amounts to one 
notion of rejection. The second builds upon the first and allows for rules of the same 
state to reject each other. Intuitively this may seem odd but, as will be seen below, it 
produces desirable results. The third notion builds upon the first to further impose that 
the rejector rule is itself not rejected. Intuitively this condition seems reasonable but, as 
will be seen below, it produces results that may not be desirable. 

Definition 1 (Rejected Rules). LetV = (Pi, ... ,P S ) be a DLP and M an interpretation. 
We define: 

Rej (P, M) = {r \ r £ Pj, 3 r' £ Pj, i < j, r IX r', M \= B{r')} 

Rej + (P, M ) = {r \ r £ P,, 3r' £ P j,i < j,r IX r' , M \= B{r')} 

Rej* (P, M ) = {r \ r € P*, 3 r' £ Pj \ Rej* (P, M) ,i < j,r IX r' , M \= B{r')} 

Before we define the semantics, we turn our attention to the notion of default as- 
sumptions. In Section 2, when we defined the semantics of answer-sets for GLPs, we 
purposely did it in a somehow non standard way. Instead of using the standard GL 
transformation [6] (or, to be more precise, its modified version for GLPs [12]), we ex- 
plicitly add the default assumptions (default literals for all those objective literals that 
do not belong to the considered interpretation) to the program and then determine its 
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least model. Similarly, when determining the semantics for DLPs, one also considers 
the least model of a program, consisting of all rules belonging to the programs of the 
DLP, without the rejected rules, together with a set of default assumptions. But unlike 
for the answer-set semantics above, not all the semantics for DLPs will allow for the 
7explicit addition of every default literal whose corresponding objective literal does not 
belong to the interpretation being considered. Instead, for such DLP semantics, the de- 
fault assumptions (not A) are restricted to those corresponding to unsupported objective 
literals i.e., those objective literals A for which there is no rule in the DLP whose body 
is true in the interpretation being considered. The intuition behind this is that if there 
exists a rule in the DLP (rejected or not) that would support the truth of some objective 
literal A, then A should not be able to be assumed false by default. Instead, its falsity, 
to exist, should only be obtained by some newer rule r that forces it to be false i.e., one 
with H (r) = not A and Af \= B (r) (where Af is the interpretation being considered). 
Otherwise, we can have situations where the assumption that some objective literal A is 
false indirectly leads to the rejection of a fact A, as will be shown below. Since not all 
semantics impose this restriction, we define two notions of default assumptions: 

Definition 2 (Default Assumptions). Let P = (Pi, ..., P s ) be a DLP and M an inter- 
pretation. Define (where A is an objective literal): Def* (P, Af) = {not A \ A ef Af} 
and DeflfP , Af) = {not A | $r G p(P), ff(r) = A, M 1= B{r)}. 

Using each combination of rejected rules and default assumptions, we are now ready 
to define six distinct semantics for DLPs, all of which based on the intuition that some 
interpretation is a model according to a semantics iff it obeys an equation based on the 
least model of the multiset of all the rules in the (expanded) DLP, without those rejected 
rules, together with a set of default assumptions. The six semantics are dubbed dynamic 
stable model semantics ( DSM ), dynamic justified update semantics ( DJU ), dynamic u- 
model semantics ( DUM ), dynamic answer-set semantics (DAS), refined dynamic stable 
model semantics ( RDSM ) and refined dynamic justified update semantics ( RDJU ). 

Definition 3 (Semantics of Updates). Let V = (Pi, ..., P s ) be a DLP and A I an inter- 
pretation. Then: 

DSM: Af is a dynamic stable model ofV iff 

M' = least ( p (P) - Rej (P, M) U Def (P, Af)) 

DJU: A 1 is a dynamic justified update o/P iff 

M' = least ( p (P) - Rej (P, M) U Def* (P, Af)) 

DUM: M is a dynamic u-model ofV iff 

M' = least (p (P) - Rej* (P, Af) U Def (P, Af)) 

DAS: Af is a dynamic answer-set o/P iff 

M' = least (p (P) - Rej* (P, Af) U Def* (P, Af)) 

RDSM: Af is a refined dynamic stable model o/P iff 

M' = least (p (P) - Rej+ (P, Af) U Def (P, Af)) 

RDJU: Af is a refined dynamic justified update o/P iff 

M' = least ( p (P) - Rej+ (P, Af) U Def* (P, Af)) 

where M' , p{.) and least{.) are as before. Let DSA1 (P) denote the set of all dynamic 
stable models o/P, DJU (P) the set of all dynamic justified updates o/P, DU Af (P) 
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the set of all dynamic u-models ofV, DAS (V) the set of all dynamic answer-sets of V, 
RDSM (V) the set of all refined dynamic stable models ofV, and RDJU (V) the set 
of all refined dynamic justified updates of V. 

We now relate these semantics with those defined in the literature. 

Proposition 1. Let V be a DLP. M is a dynamic stable model ofV iff M is a stable 
model of V (as of [2,8]). 

Remark 1. Strictly speaking, if we consider the definitions in [2], the result only holds 
for normal generalized logic programs (i.e. without strong negation). This is so because 
of an incorrection in the original definition which has been corrected in [8]. 

Proposition 2. Let V be a DLP. M is a dynamic justified update of V iff M is a 
V— justified update (as of [10, 9]). 

Remark 2. The semantics of V— justified updates was originally established in a setting 
where Revision Programs [13] were used instead of GLP’s. We consider the trivial 
correspondence between GLPs and Revision Programs where each in (A) (resp. out (A) ) 
of the latter corresponds to a A (resp not A) of the former. 

The dynamic answer-set semantics generalizes the update answer-set semantics [4], 
originally defined for DLPs consisting of extended logic programs only i.e., without 
default negation in the heads of rules, to the case of DLPs consisting of GLPs. 

Proposition 3. Let V = (Pi, P s ) be a DLP where each Pi is an extended logic 
program. M is a dynamic answer-set of V iff M is an update answer-set (as of [4]). 

In [1] the use of strong negation is not allowed. The refined dynamic stable model 
semantics defined here extends the semantics of [1] to allow for strong negation. 

Proposition 4. Let V = (P\, . ... If ) be a DLP where each Pi is a normal generalized 
logic programs (i.e. without strong negation). M is a refined dynamic stable model of 
V iff M is a refined dynamic stable model (as of [1 ]). 

The dynamic u-model semantics and the refined dynamic justified update semantics 
do not correspond to any previously defined semantics. Their definitions are only justified 
for the sake of completeness since they both suffer from some drawbacks. 

4 Properties and Comparisons 

In this Section we compare all six semantics by means of some properties and examples 
that illustrate their similarities and differences. We start by relating all semantics with 
the answer-set semantics: 

Proposition 5 (Generalization of Answer-Set Semantics). Let V = (P) be a DLP 

consisting of a single GLP. Then DSM(V) = DJU(V) = DUM(V) = DAS(V) = 
RDSM(V) = AS(P) 

A similar proposition does not hold for the refined dynamic justified update semantics 
( P = {a ; not a <— } serves as a counter-example). Since this semantics fails to obey 
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this simple and desirable property, we will not consider it further in this paper. Next we 
relate the sets of models that characterize each semantics: 

Theorem 1 . Let V = (Pi,...,P s ) be a DLP. Then RDSM(V) C DSM (P) C 
DJU (P) C DAS ( V ) and RDSM(V) C DSM ( V ) C DUAL ( V ) C DAS ( V ). 

Example 1 . Let V =(Pl, P2, P3) be the DLP where Pi = {a <— }, P2 = {not a <— }, 
and P 3 = {a 4- a}. We obtain RDSM(V) = DSM (V) = DJU {V) = {{}} and 
DUAL (■ P ) = DAS C P ) = {{} , {a}}. 

This example serves to illustrate the difference in the results caused by allowing 
rejected rules to reject rules or otherwise. For those semantics that use Rej (P, M) and 
Rej + (P, M) (RDSM, DSM and DJU), we only obtain one model, namely {}. Since, 
with Rej (P, M) and Rej + (P, AL). rejected rules can reject other rules, the rule a 4 — 
in Pi is always rejected by the rule not a <— in P2, independently of the interpretation 
M being considered. Then, since there are no rules that support a (the rule a 4— a in 
P 3 is not sufficient, by itself), we cannot have a model such that a belongs to it. The 
interpretation {} is then the only one that verifies the condition to be a model. Note 
that this is valid for RDSM, DSM and DJU. For those semantics that use Rej* (P, M) 
(DUM and DAS), since rejected rules are not allowed to reject other rules, the rejection 
of the rule a 4— in Pi now depends on the rule not a <— in P2 being rejected or not. 
If we consider the interpretation {}, then the rule a 4 — is rejected and {} verifies the 
condition to be a model. If we consider the interpretation {a}, then the rule a <— a in 
P 3 rejects the rule not a 4— in P 2 and, consequently, a 4— in !\ is no longer rejected. 
Since least ( a 4 —; a 4 — a) = {a}, we have that {a} is also a model according to DUM 
and DAS. By inspecting P, we argue that the intuitive result should be to allow for one 
model only, namely { } . Since we are dealing with updates, the rule not a 4 — in P2 should 
be understood as a change in the world into a state where a is unconditionally not true. 
Therefore, we should not be able to reuse the rule a 4— in I\ again. According to DUM 
and DAS, this old rule in P 3 serves as the support for itself not to be rejected i.e. it serves 
as the support for a, rendering the rule a 4 — a in P 3 one that rejects not a <— in P2, and 
therefore not able to reject a 4— in p . The reader may find this line of argumentation 
more convincing after replacing the proposition a with the proposition alive. Below, we 
come back to this issue when we define a property that encodes the intuition behind it, 
which holds for RDSM, DSM and DJU, but not for DUM and DAS. 

Example 2 . Consider P =(P ■ IS) to be the DLP where I\ = {a 4—} and P 2 = 
{not a <— not a}. We obtain RDSM{V) = DSM (P) = DUAL (P) = {{a}} and 
DJU(T)=DAS(V) = {{},{a}}. 

With this example we can observe the different results obtained by using either 
Def (P, M) or Def* (P, AL). When Def (P, M) is used (RDSM, DSM and DUM), 
then only one model is obtained namely {a}. If we use Def* (P, AL) (DJU and DAS), 
the model {} also exists. This model is obtained by assuming a to be false by default i.e, 
not A £ Def* (P, M). This default assumption justifies the use of the rule not a 4— 
not a in P2 to reject the fact a <— in P 3 . Arguably, if we look at the rule not a 4— not a 
as a tautological one, it is fair to expect it not to be responsible, by itself, for a change 
in the semantics. We argue that this DLP should only have one model, namely {a}. 
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The previous example leads to the next property, which serves as argument in favor 
of using Def ( V , AL) instead of Def* (V, AL). It relates these semantics with Revision 
Programming [13]. Such framework, which for lack of space we cannot formally reca- 
pitulate here, characterizes the interpretations that should be accepted as the result of 
the update of an initial interpretation by a revision program. Such accepted interpreta- 
tions are called M -justified updates. If we encode the initial interpretation as a purely 
extensional GLP and make it the first program of a DLP with two programs, where the 
second encodes the revision program, then it is desirable that a semantics for the DLP 
coincide with the one provided by the interpretation updates. It turns out that only the 
three semantics that use Def ( V , AL) coincide with interpretation updates. 

Definition 4 (Generalization of Interpretation Updates). Let M be an interpretation 
and RP a revision program ( according to [13]). Let Prp be the GLP obtained from RP 
by replacing atoms of the form in ( L ) (resp. out ( L )) with L (resp not L). Let Pm = 
{A <— | A £ AL}. We say that an update semantics SEAI generalizes Interpretation 
Updates (IU)( in the sense of[ 13]) iff for every Al and Prp it holds that an interpretation 
I is a AL -justified update iff it is a model of (Pm, Prp) according to SEAL i.e., I £ 

SEAL ((P M , Prp)). 



Theorem 2 (Generalization of Interpretation Updates). RDSAL, DSAL and DUAL 
generalize IU. DJU and DAS do not generalize IU. 

The DLP in Example 2 serves as a counter example to show that DJU and DAS do 
not generalize LU because according to the latter, {a} is the only AL -justified update. 

From the exposition so far, it becomes apparent that the differences between the 
semantics are very subtle, and all related to the extent that these semantics are immune 
to tautologies. Immunity to tautologies is desirable and can be defined as follows: 

Definition 5 (Immunity to Tautologies). An update semantics SEAI is immune to 
tautologies iff for any DLP V = (P\, ..., P s ) and any sequence of sets of tautologies £ 
= (E u ..., E s ), it holds that SEAL (V) = SEAL (V U £). 

Theorem 3. DSAL, DU AL, DJU and DAS are not immune to tautologies. 

Example 3. Consider the DLP V =(P\, Pf) where Pi = {a ; not a } and P 2 = 
{a <— a}. All DSAL, DUAL, DJU and DAS have a single model, namely {a}. Ac- 
cording to these four semantics, if one program is contradictory, a tautological update 
has the effect of removing such contradiction. 

Theorem 4. [1] RDSAL is immune to tautologies. 

Before we proceed, we establish a notion of equivalence between two DLPs under 
some update semantics. Intuitively two DLPs are equivalent if they have the same se- 
mantics and, when both are updated by the same arbitrary sequence of programs (i.e. 
the updating sequence is appended to both DLPs), their semantics still coincides. 
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Definition 6 (Update Equivalence). Two DLP.v V a and Pp are update equivalent un- 
der semantics SEM, denoted by V a S = M Vp, iff for every DTP V, it holds that 
SEM (( T a ,V )) = SEM (( Vp,V )). 

This notion of equivalence is important inasmuch as it allows us to replace a DLP 
that describes the history of some world by a simpler one, if we are not concerned with 
the past history, but we want to guarantee that the present and future are preserved. 
Ideally, we would like to devise, for each semantics, an operator that would condense 
any two consecutive programs belonging to a DLP into a single one, written in the same 
language. By repeatedly applying such an operator we would reduce a DLP to a single 
GLP. Such operators have the following formal definition: 

Definition 7 (General State Condensing Operator). Let A be a set of propositional 
atoms. Let II denote the set of all generalized logic programs over the set of atoms A. 
Let 0 be an operator with signature 0 : II x 77 — > 77. We say that 0 is a general 
state condensing operator for language A and semantics SEAT iff for every DLP V 
= (Pi, ..., P s ) over A it holds thatV b =‘ (Pi, ..., Pj_i, 0 (Pi, P i+1 ) , P i+2 , ...,P S ). 

Theorem 5. Let A be a non-empty language. General state condensing operators for 
language A and semantics RDSM, DSM, DJU, DU M, and DAS, do not exist. 

Example 4. Let V =(Pi, P 2 ) be the DLP , over A = {a, b}, where Pi = {a <— b\ b «— } 
and P 2 = {notb<— nota}. We have RDSMfP) = DSMfP) = DJU{V ) = 
DU AT(V) = DASifP) = {{},{a, 6}}. Since a general state condensing operator 
0 would condense P\ and P 2 into a single program 0 (Pi, P 2 ), and all five update se- 
mantics coincide with the answer-set semantics, for DLPs with a single program, then, 
AS{0 (Pi, P 2 )) would have to be equal to {{} , {a, b}}. But there is no generalized 
logic program whose answer sets are {} and {a, b}, because {} C {a, b } and it is known 
that answer-sets are minimal. Similar examples written in a language containing just one 
propositional atom exist, from which we prove the theorem. 

The definition for general state condensing operators requires the language in which 
the resulting program is written to be the same as that of the original DLP. If we allow 
extensions to the original language, then there exists, for each of the semantics intro- 
duced, a polynomial transformation that takes a DLP and produces a single GLP, written 
in an extended language, whose answer-sets (when restricted to the initial language) 
coincide with the models of the update semantics. The existence of these transforma- 
tions allows for the use of available answer-set software (e.g. SMODELS [14] and DLV 
[11]) to determine the update semantics, by means of a preprocessor that implements 
the transformation. Some of these implementations are publicly available. We do not 
present such transformations here for lack of space (some are in the cited literature), but 
their existence suffices to establish the following complexity results for all the semantics 
(inherited from those of Logic Programming under the Answer-set semantics): 

Theorem 6 (Computational Complexity). Let V be a DLP. Deciding if DSM (V) 
( resp. DJU (fP), DUAL (V), DAS (V), RDSM (V) ) is not empty is NP — complete. 
Deciding if an interpretation belongs to DSMfP) (resp. RDSMfP), DU MfP), 
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DAS(P), DJU ( V )) is P. Deciding if an atom is true in at least one interpretation 
that belongs to DSM (V) (resp. DJU (V), DUM (V), DAS (V), RDSM (V)) is 
NP — complete; Deciding if an atom is true in all interpretations that belong to 
DSM (V) (resp. DJU ( V ), DUM (' V ), DAS ( V ), RDSM (V) ) is coN P - complete. 

Since the size of the program obtained by the mentioned transformations depends 
on the number of rules and the number of states of the DLP, we now address ways of 
simplifying a DLP both by eliminating certain states and certain rules, while preserving 
update equivalence. 

Definition 8 (State Elimination). Let V = (Pi, P s ) be a DLP. Define: 

SEa: If Pi = {}, then V S W (P 1; ..., P i _ 1 ,P i+ 1 , P„); 

SEb: If Jr G P u r' G P i+1 :r^r', then V S W (P 1} ..., P,_i, P 4 U P i+1 , P i+2 , ..., P s ). 



The first one, SEa, encodes that updates by empty programs should not change the 
semantics and should be allowed to be removed. SEb encodes that two consecutive 
programs that have no conflicting rules should be able to be merged into a single one. 
We now proceed to state simplification i.e. (syntactical) removal of superfluous rules: 

Definition 9 (State Simplification). Let V = (Pi, ..., P s ) be a DLP. Define: 

SSa: If r G Pi and 3 r' G Pj,i < j,H(r') = H (r) and B(r') C B(r), then 
P s i M (Pi,...,PA{r},...,P s ); 

SSb: If r G P% and 3r' G Pj,i < j,H (r') = not H (r) and B(r') C B(r), then 

P s i M (Pi,...,PA{r},...,P s ). 

The first one, SSa, encodes that one should be able to remove an older rule for some 
literal, if a newer rule for that literal exists and is equal or more general (i.e. its body 
is a subset of the older rule’s body). SSb encodes that one should be able to remove an 
older rule for some literal if a newer rule for its default complement exists and its body 
is equal or a subset of the older rule’s body. SSb seems intuitive because if the body of 
the newer rule is always true when the body of the older rule also is, then the conclusion 
of the newer rule should always prevail over the conclusion of the older one. 

Theorem 7. SEa and SEb hold for RDSM, DSM, DJU, DUM and DAS. SSa holds 
for RDSM, DSM, DJU, DUM and DAS. SSb holds for RDSM, DSM and DJU. 
SSb does not hold for DUM and DAS. 



The following example, illustrates why SSb does not hold for DUM and DAS, which 
is related to their use of Rej* ( V , M), instead of Rej (V, M). 

Example 5. Consider the DLP of Ex. 1 . By removing rule a <— from Pi (due to rule 
not a in P 2 ) we obtain V =({},P2,P3). Note that p D = M p' and p n =V , but 

DUM DAS 

V ^ P'andP V . To confirm the later negative case, just observe that DAS(V) = 
{{} , {a}} and DAS(V) = {{}}, i.e. DAS(V) A DAS(V). Likewise for DUM. 
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5 Discussion and Conclusions 

From the definitions and results above, it becomes apparent that the differences between 
the semantics are very subtle, and all related to the extent that they are immune to tau- 
tologies 3 . Of the five semantics presented, the Refined Dynamic Stable Model semantics 
of [1] is the only one that is immune to tautologies. It turns out that such semantics 
goes a step further and is also immune to more elaborate tautological updates involving 
dependencies amongst more rules (c.f. [1]). 

Related to Logic Program Updates, we can find other semantics in the literature [17, 
15], although not following the causal rejection of rules approach. They follow a mixture 
of interpretation updates and rule based updates as they both determine the models of the 
theory before performing the update, yielding results that differ significantly from the 
ones described in this paper. In [3] the authors propose Disjunctive Logic Programs with 
Inheritance, which can be used to encode updates. In [4], the non-disjunctive fragment 
is proved equivalent to the Update Answer Sets semantics. 
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Abstract. Recent advances in Computerized Numeric Control (CNC) 
have allowed the manufacturing of products with high quality standards. 
Since CNC programs consist of a series of assembler-like instructions, 
several high-level languages (e.g., AutoLISP, APL, OMAC) have been 
proposed to raise the programming abstraction level. Unfortunately, the 
lack of a clean semantics prevents the development of formal tools for 
the analysis and manipulation of programs. In this work, we propose the 
use of Haskell for CNC programming. The declarative nature of Haskell 
provides an excellent basis to develop program analysis and manipulation 
tools and, most importantly, to formally prove their correctness. 



1 Introduction 

Computerized Numeric Control (CNC for short) machines have become the ba- 
sis of many industrial processes. CNC machines include robots, production lines, 
and all those machines that are controlled by digital devices. Typically, CNC 
machines have a machine control unit (MCU) which inputs a CNC program 
and controls the behavior and movements of all the parts of the machine. Cur- 
rently — as stated by the standard ISO 6983 [3] — CNC programs interpreted by 
MCUs are formed by an assembler-like code which is divided into single instruc- 
tions called G-codes (see Fig. 1 below). 

One of the main problems of CNC programming is their lack of portability. In 
general, each manufacturer introduces some extension to the standard G-codes 
in order to support the wide variety of functions and tools that CNC machines 
provide. Thus, when trying to reuse a CNC program, programmers have to tune 
it first for the MCU of their specific CNC machines. For example, even though 
both CNC machines HASS VF-0 and DM2016 are milling machines, the G-codes 
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they accept are different because they belong to different manufacturers (e.g., 
the former is newer and is able to carry out a wider spectrum of tasks). 

CNC programming is not an easy task since G-codes represent a low-level 
language without control statements, procedures, and many other advantages of 
modern high-level languages. In order to provide portability to CNC programs 
and to raise the abstraction level of the language, there have been several pro- 
posals of intermediate languages, such as APL [10] and OMAC [8], from which 
G-codes can be automatically generated with compilers and post-processors. Un- 
fortunately, the lack of a clean semantics in these languages prevents the devel- 
opment of formal tools for the analysis and manipulation of programs. Current 
CNC programming languages, such as Auto-Code [7] or AutoLISP [2, 13], allow 
us to completely specify CNC programs but do not permit to analyze program 
properties like, e.g., termination, or to formally use heuristics when defining the 
behavior of CNC machines (as it happens with autonomous robots). 

In this work, we propose the use of the pure functional language Haskell [12] 
to design CNC programs. Our choice relies on the fact that Haskell is a mod- 
ern high-level language which provides a very convenient framework to produce 
and -formally — analyze and verify programs. Furthermore, it has many useful 
features such as, e.g., lazy evaluation (which allows us to cope with infinite data 
structures), higher-order constructs (i.e. the use of functions as first-class cit- 
izens, which allows us to easily define complex combinators) , type classes for 
arranging together types of the same kind, etc. This paper presents our proposal 
for CNC programming using Haskell and shows the main advantages of Haskell 
over current languages which are used for the same purpose. 

This paper is organized as follows. In the next section, we provide a brief re- 
view of CNC programming. Section 3 introduces the functional language Haskell. 
In Sect. 4, we illustrate the use of Haskell to represent CNC programs and, then, 
in Sect. 5, we enumerate some advantages of choosing Haskell over other exist- 
ing languages. Some implementation details are discussed in Sect. 6 and, finally, 
conclusions and some directions for future work are presented in Sect. 7. 



2 CNC: A Brief Review 

Computer numerical control is the process of having a computer controlling 
the operation of a machine [19]. CNC machines typically replace (or work in 
conjunction with) some existing manufacturing processes. Almost all operations 
performed with conventional machine tools are programmable with CNC ma- 
chines. For instance, with CNC machines we can perform motion control in 
linear — along a straight line — or rotary — along a circular path — axes. 

A CNC program is composed by series of blocks containing one or more in- 
structions, written in assembly-like format [14]. These blocks are executed in 
sequential order, step by step. Each instruction has a special meaning, as they 
get translated into a specific order for the machine. They usually begin with a 
letter indicating the type of activity the machine is intended to do, like F for feed 
rate, S for spindle speed, and X, Y and Z for axes motion. For any given CNC 
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Fig. 1 . Simple CNC Program 



machine type, there are about 40-50 instructions that can be used on a regular 
basis. G words, commonly called G codes, are major address codes for prepara- 
tory functions, which involves tool movement and material removal. These in- 
clude rapid moves, lineal and circular feed moves, and canned cycles. M words, 
commonly called M codes, are major address codes for miscellaneous functions 
that perform various instructions do not involving actual tool dimensional move- 
ment. These include spindle on and off, tool changes, coolant on and off, and 
other similar related functions. Most G and M-codes have been standardized, 
but some of them still have a different meaning for particular controllers. 

As mentioned earlier, a CNC program is composed by series of blocks, where 
each block can contain several instructions. For example, N0030 G01 X3.0 Y1.7 
is a block with one instruction, indicating the machine to do a movement (linear 
interpolation) in the X and Y axes. Figure 1 shows an example of a simple CNC 
program for drilling three holes in a straight line. 



3 An Overview of Haskell 

Haskell is a general-purpose, purely functional programming language that pro- 
vides higher-order functions, non-strict semantics, static polymorphic typing, 
user-defined algebraic datatypes, pattern matching, list comprehensions, a mona- 
dic input/output system, and a rich set of primitive datatypes [12]. In Haskell, 
functions are defined by a sequence of rules of the form 

f t\ ... t n — e 

where are constructor terms and the right-hand side e is an expression. 

The left-hand side must not contain multiple occurrences of the same variable. 
Constructor terms may contain variables and constructor symbols, i.e., symbols 
which are not defined by the program rules. Functions can also be defined by 
conditional equations which have the form 
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f ti ... t n \c = e 

where the condition (or guard) c must be a Boolean function. Function definitions 
have a (perhaps implicit) declaration of its type, of the form 

/ :: ai — > a 2 a n -> b 

which means that / takes n elements of types ai, ... ,a n and returns an element 
of type b. For example, the following function returns the length of a given list: 

length : : [a] -> Int 

length [] =0 

length (_:xs) = 1 + length xs 

Note that in this example, “a” is a type variable which stands for any type. 
Local declarations can be defined by using the let or where constructs. For 
example, the following function returns True if the first argument is smaller than 
the length of the list being passed as a second argument, or False otherwise: 

indexChecker : : Int -> [a] -> Bool 
indexChecker n xs = n <= 1 where 1 = length xs 

A Haskell function is higher-order if it takes a function as an argument, 
returns a function as a result, or both. For instance, map is a higher-order function 
that applies a given function to each element in a list: 

map : : (a -> b) -> [a] -> [b] 

map f [] = [] 

map f (x:xs) = f x : map f xs 

Haskell provides a static type semantics, and even though it has several prim- 
itive datatypes (such as integers and floating-point numbers), it also provides a 
way of defining our own (possibly recursive) types using data declarations: 

data Bool = False I True 

data Tree a = Leaf a I Branch (Tree a) (Tree a) 

For convenience, Haskell also provides a way to define type synonyms', i.e. , 
names for commonly used types. Type synonyms are created using a type dec- 
laration. Here are some examples: 

type String = [Char] type Name = String 

type Person = (Name , Address) data Address = None I Addr String 

Type classes (or just classes) in Haskell provide a structured way to introduce 
overloaded functions that must be supported by any type that is an instance of 
that class. For example, the Equality class in Haskell is defined as follows: 

class Eq a where 

(==) : : a -> a -> Bool 
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G01: Moves the turret chuck along the XYZ axes. It can be followed by XYZ codes. 
G90: Indicates that absolute positioning is being used. 

G91: Indicates that incremental positioning is being used. 

X(-)nn: used to move the turret chuck along the X axis 
Y(-)nn: used to move the turret chuck along the Y axis 
Z(-)nn: used to move the turret chuck along the Z axis 

where nn indicates: 

— the new absolute position in the corresponding axis, where (0,0,0) is a given 
reference point over the table ( absolute positioning). 

— the number of units in the current axis that the tool is being shifted ( incre- 
mental positioning). 



Fig. 2. Instructions set for simple CNC drilling machine 



A type is made an instance of a class by defining the signature functions for 
the type, e.g., in order to make Address an instance of the Equality class: 

instance Eq Address where 

None == None = True 

Addr stl == Addr st2 = stl == st2 
== = False 

where _ is a wildcard, used to introduce a default case. 

We refer the interested reader to the report on the Haskell language [12] for 
a detailed description of all the features of the pure functional language Haskell. 



4 Using Haskell for CNC Programming 

In this section, we illustrate the use of Haskell for CNC programming. By lack of 
space, we consider a simple CNC drilling machine which can just move the turret 
chuck in the X and Y axes (in order to position the drill bit), and in the Z axis 
(in order to make the hole) . The machine also handles absolute and incremental 
positioning of the turret chuck. 1 

A CNC program for this machine consists of a header and a body. The header is 
optional and is usually a short comment, whilst the body is a list of blocks, where 
each block is identified by a number (Nnnnn) and can contain either one or more 
instructions or a comment, where comments are always parenthesized. 

An instruction can contain one of the CNC codes shown in Fig. 2. In the follow- 
ing, we consider millimeters as measurement units. 

For instance, a CNC program for drilling two holes at positions (10,10) and 
(15,15) is as follows: 



1 A full example with the implementation of complete data structures needed to repre- 
sent CNC programs can be found at http : //www. dsic .upv . es/~ jsilva/cnc. 
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Fig. 3. Haskell data structures for representing a CNC program 
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In this example, the block N0010 denotes a comment, N0020 instructs the CNC 
machine to use absolute positioning, N0030 moves the turret chuck 5mm over the 
table in the Z axis, N0040 positions the turret chuck in the (10,10) coordinate 
(note that X10 Y10 is a shortcut for G01 X10 Yio), N0050 moves the turret chuck 
25mm under the table in the Z axis, thus making a hole, N0060 moves the turret 
chuck 5mm over the table in the Z axis, in order to be able to move it again 
in the XY axes, N0070 positions the turret chuck in the (15,15) coordinate, and 
finally N0080 and N0090 create the second hole. 

It should be clear from this example that, when a big number of holes should 
be done, the amount of lines of code we have to write also increases considerably. 
The Haskell data structure intended to hold a CNC program is shown in Fig. 3. 

For simplicity, our data structure does not contain information about block 
numbering. Nevertheless, given a list of blocks, it is straightforward to build a 
function that adds such numbering to each block. In our context, a CNC pro- 
gram is composed of a header and a body. A header is an optional comment (a 
String or the constructor Nothing when missing). A body is a set of blocks, each 
block being a comment or a set of instructions. 

Figure 4 shows a function for drilling n holes in a straight line implemented 
in Haskell. Note that nHolesLine returns a list of blocks, instead of a complete 
CNC program as defined in Fig. 3 (the header is missing). Far from being a 
shortcoming, this is due to the fact that nHolesLine is integrated in an environ- 
ment with many other different functions; therefore, there is a master function 
which builds the whole program by prompting the user for a header comment, 
if any, and integrating the code obtained from the different functions. 

The function nHolesLine receives as parameters the number of holes, n, the 
initial XY coordinate and the corresponding increments in each axis. Then, this 
function generates a list of blocks by creating a list of n elements containing the 
absolute XY coordinates of the holes; it uses the higher-order function map to 
apply the function makeHole to each XY coordinate. 
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— creates a [Block] containing the CNC instructions for 

— making n holes in a straight line 

nHolesLine : : Int -> Int -> Int -> Int -> Int -> [Block] 
nHolesLine n x y incX incY = [posit, init] ++ concat (map makeHole nLine) 
where line = createLine x y incX incY 

nLine = finiteLine n line 

posit = Code [G "90"] 

init = Code [G "00",Z 5] 



— creates an infinite list of absolute coordinates 
createLine :: Int -> Int -> Int -> Int -> [(Int, Int)] 

createLine x y incX incY = (x,y) : createLine (x+incX) (y+incY) incX incY 

— takes a list and returns a sublist containing the first n elements 
finiteLine : : Int -> [a] -> [a] 

finiteLine [] = [] 

finiteLine 0 = [] 

finiteLine n (x:xs) = x : finiteLine (n-1) xs 



— takes an XY coordinate and return a block containing 

— all instructions needed to make a hole at such position 
makeHole :: (Int, Int) -> [Block] 

makeHole (x,y) = [posXY,down,up] 

where up = Code [Z 5] 

down = Code [Z (-25)] 

posXY = Code [X x,Y y] 



Fig. 4. Example program nHolesLine 



Note that, even though the list of n elements is created by first calling 
createLine — which generates an infinite list — only n elements of such a list are 
actually built. This is due to the lazy evaluation of Haskell (see Sect. 5). For 
instance, in order to make 50 holes, starting at position (10,10) and increasing 
5mm in each axis per hole, we simply call function nHolesLine as follows: 

nHolesLine 50 10 10 5 5 

The same example using G codes requires about 150 lines of code. However, 
with Haskell functions it is very simple to change any of the parameters to achieve 
any straight line of holes to be drilled. In the same way, a lot of helpful functions 
can be created, in order to make different shapes like ellipses, grids, etc, that are 
able to work with other CNC machines like lathes and milling machines. 

5 Some Advantages of Haskell 

In the following, we summarize the most significant advantages of Haskell for 
CNC programming: 
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Data Structures and Recursion. Haskell allows the definition of complex data 
structures (such as 3D geometric pieces) or iterative ones (such as hole meshes) 
that can be later manipulated and reused. A common way of defining and ma- 
nipulating such data structures is to use recursion. In Fig. 4, two lists, line and 
nLine, are used to describe a set of specific positions of a piece. We recursively 
apply a defined function to them by using a simple command. 

Polymorphism. Functions in Haskell can be polymorphic, i.e., they can be applied 
to different types (compare function length, which can be applied to lists of any 
kind). In our context, this means that some functions can be reused in many 
parts of the CNC program with different input data. 

Higher-Order Functions. Higher-order facilities [4] are one of the main advantages 
of Haskell over the rest of languages currently used for CNC programming, 
incorporating a big amount of predefined higher-order functions allowing us to 
optimize and minimize the size of the code with a high expressiveness. 

Laziness. Haskell follows a lazy evaluation model [1, 18], which means that func- 
tions are evaluated on demand. This is particularly useful when dealing with 
infinite data structures. For instance, consider a robot hand which performs a 
specific movement each time a piece is under it. Thanks to laziness, we can define 
the behavior of the robot hand by this infinite movement since it will only be 
evaluated as much as needed. To the best of our knowledge, all languages used 
in CNC programming lack of lazy evaluation (i.e., they are strict languages with 
call by value evaluation). 

Type Checking System. Haskell includes a standard type inference algorithm dur- 
ing compilation. Type checking can be very useful to detect errors in a CNC 
program, e.g., to detect that a drilling tool has not been separated from the 
piece surface after its use. Since CNC programs are usually employed in mass- 
production processes, program errors are very expensive. Therefore, having built- 
in type error checkers provided by a high-level compiler represents a significant 
advantage over usual CNC programs written by hand. 

Type Classes. Type classes in Haskell provide a structured way to introduce over- 
loaded 2 functions that must be supported by any type that is an instance of that 
class [20]. This is a powerful concept that allows us, e.g, to arrange together data 
structures (representing CNC machines) having a similar functionality and to 
define standard functions for such types. When introducing a new data structure 
representing a CNC machine having a functionality similar to an existing one, 
we can just derive it from such a class, applying the existing functions to this 
data structure without re-writing any code. 

Verification and Heuristics. Haskell is a formal language with many facilities to 
prove the correctness of programs [15]. This represents the main advantage of 



2 While polymorphic functions have a single definition over many types, overloaded 
functions have different definitions for different types. 
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our proposal compared with current languages being used for the same purpose. 
Thus, formal verification of CNC programs can be performed to demonstrate 
its termination, correctness, etc. Moreover, it makes possible the application 
of heuristics to define the behavior of CNC machines (as it happens with au- 
tonomous robots). This is subject of ongoing work and justifies our choice of 
Haskell for CNC programming. 

Furthermore, Haskell is amenable to formal verification by using theorem 
provers such as Isabelle [11] or HOL [9,17], verification logics such as P-Logic 
[5, 6], etc. 

6 Implementation Remarks 

The implementation of a Haskell library to design and manipulate CNC pro- 
grams has been undertaken. This library currently contains: 

— an XML DTD properly defined to completely represent any CNC program, 

— a specific Haskell data structure equivalent to the DTD, and 

— a set of functions to build and test CNC programs. 

In order to guarantee the portability of the CNC programs produced in our 
setting, we have defined an XML DTD which is able to represent any CNC 
program since it contains all the syntactic constructs specified in the standard 
ISO 6983 [3], as well as the extensions proposed in [16,19]. With this DTD, 
we can properly define any CNC program and, with some Haskell translation 
functions, automatically convert it to/from an equivalent Haskell data structure. 

We have also implemented a library of functions which allows us to build 
and transform the data structure representing CNC programs in a convenient 
way. We provide several basic testing and debugging functions. Preliminary ex- 
periments are encouraging and point out the usefulness of our approach. More 
information (the implementation of Haskell library, the XML DTD and some 
examples) are publicly available at http://www.dsic.upv.es/~jsilva/cnc 



7 Conclusions and Future Work 

This work proposes Haskell as a high-level language for the design and implemen- 
tation of CNC programs. We have clarified its advantages over existing languages 
for CNC programming. Besides typical high-level features — such as control se- 
quence, recursion, rich data structures, polymorphism, etc. — Haskell provides 
several advanced features like higher-order combinators, lazy evaluation, type 
classes, etc. Furthermore, Haskell offers a clean semantics which allows the de- 
velopment of formal tools. 

We have defined a Haskell data structure which is able to represent any 
CNC program, allowing us to properly manipulate it by using Haskell features. 
Furthermore, we implemented an XML DTD which is fully equivalent to the 
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Haskell data structure. This DTD ensures the portability of our programs among 
applications and platforms. 

Preliminary experiments are encouraging and point out the usefulness of our 
approach. However, there is plenty of work to be done, like augmenting our li- 
brary with other useful functions for making geometric figures, defining functions 
for other CNC machines (lathes, milling machines, etc), defining libraries for as- 
sisting the user in the post-processing of CNC programs, defining a graphical 
environment for simplifying the task of designing CNC programs, etc. 
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Abstract. Over the last years various semantics have been proposed 
for dealing with updates of logic programs by (other) logic programs. 
Most of these semantics extend the stable models semantics of normal, 
extended (with explicit negation) or generalized (with default negation 
in rule heads) logic programs. In this paper we propose a well founded 
semantics for logic programs updates. We motivate our proposal with 
both practical and theoretical argumentations. Various theoretical results 
presented here show how our proposal is related to the stable model 
approach and how it extends the well founded semantics of normal and 
generalized logic programs. 



1 Introduction 

When dealing with knowledge bases modelling knowledge that may change over 
time, an important issue is that of how to automatically incorporate new (up- 
dated) knowledge without falling into an inconsistency each time this new knowl- 
edge is in conflict with the previous one. When knowledge is represented by logic 
programs (LPs), this issue boils down to that of how to deal with LPs updates. 
In this context, updates are represented by sequences of sets of logic program- 
ming rules, also called dynamic logic programs (DyLPs), the first set representing 
our initial knowledge, while later ones represent new incoming information. In 
the last years, several semantics had been proposed for logic programs updates 
[1,2,5,9,13-15,17,18]. Most of these semantics are extensions of the stable mod- 
els semantics of extended (with explicit negation) [12] or generalized (allowing 
default negation in rule heads) [16] LPs. This is a natural choice given the ap- 
propriateness of stable models for knowledge representation, and the simplicity 
of the definition of stable model semantics for normal LPs, which allows various 
extensions in a natural way. However, it is our stance that there are application 
domains for logic programs updates with requirements demanding a different 
choice of basic semantics, such as the well founded semantics [11]. One of such 
requirements is that of computational complexity: in applications that require 
the capability of dealing with an overwhelming mass of information, it is very 
important to be able to quickly process such information, even at the cost of 
losing some inference power. In this respect, as it is well known, the compu- 
tation of stable models is NP-hard, whereas that of the well founded model is 
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polynomial. Another requirement not fulfilled by stable model semantics is that 
of being able to answer queries about a given part of the knowledge without 
the need to, in general, consult the whole knowledge base. The well founded 
semantics complies with the property of relevance [8] , making it possible to im- 
plement query driven proof procedures that, for any given query, only need to 
explore a part of the knowledge base. Moreover, in domains with a great amount 
of highly distributed and heterogeneous knowledge, inconsistencies are bound to 
appear not only when new knowledge conflicts with old knowledge, but also 
within the new (or old) knowledge alone. To deal with contradictions that ap- 
pear simultaneously, the mechanisms of updates are of no use, and some form 
of paraconsistent semantics [7] is required, i.e. a semantics where these contra- 
dictions are at least detected, and isolated. A well founded based semantics for 
LPs updates seems to be the answer for domains where the above requirements 
are added with the need to update knowledge. However, as we mentioned above, 
most of the existing semantics are stable models based. A few attempts to de- 
fine a well founded semantics for DyLPs can be found [2,3,13]. Unfortunately, 
as discussed in Section 5.1, none of these is, in our opinion, satisfactory, be it 
because they lack a declarative definition of the semantics, or because they are 
too skeptical. 

In this paper we define the (paraconsistent) well founded semantics of DyLPs. 
This semantics is a generalization for sequences of programs of the well founded 
semantics of normal [11] and generalized LPs [6]. Moreover it is sound wrt to the 
stable models semantics for DyLPs as defined in [1]. As for most of the existing 
semantics for DyLPs, the approach herein is also based on the causal rejection 
principle [9,14], which states, informally: an old rule is rejected if there exists 
a more recent one which is supported and whose immediate conclusions are in 
conflict with the ones of the older rule. We extend this principle from a 2-valued 
to a 3-valued setting, and apply it to the well founded semantics. 

The rest of the paper is organized as follows. Section 2 recalls some prelim- 
inary notions and establishes notation. Section 3 presents the extension of the 
causal rejection principle to the 3-valued case. In section 4 the well founded se- 
mantics for DyLPs is defined, and in section 5 some of its properties are studied 
and relations with existing proposals (briefly) established. We end, in section 6, 
with some concluding remarks. 



2 Background: Language, Concepts and Notation 

In this section we briefly recall the syntax of DyLPs, a language introduced in 
[2] for dealing with logic programs updates, and their semantics as defined in 
[1] . Our choice on this semantics for introducing the background is based on the 
fact that, among the existing ones, it is the more credulous and that it properly 
overcomes some problems of the existing ones, as shown in [1]. 

To represent negative information in logic programs and their updates, DyLP 
uses generalized logic programs (GLPs) [16], which allow for default negation 
not A not only in the premises of rules but also in their heads. A GLP defined 
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over a propositional language £ is a (possibly infinite) set of ground rules of the 
form Lq <— Li , . . . , L n , where each P, is a literal in C, i.e., either a propositional 
atom A in C or the default negation not A of a propositional atom A in C. We 
say that A is the default complement of not A and viceversa. Given a rule r as 
above, by hd(r) we mean Lq and by P(r) we mean {L \, . . . , L n }. In the sequel 
an interpretation is simply a set of literals of C. A literal L is true (resp. false) 
in I iff L £ I (resp. notL £ I) and undefined in / iff {L,notL} fl / = {}. A 
conjunction (or set) of literals C is true (resp. false) in I iff C C I (resp. 3 L £ C 
such that L is false in I) . We say that / is consistent iff \/ A £ C at most one of 
A and not A belongs to /, otherwise we say / is paraconsistent. We say that / is 
2-valued iff for each atom A £ C exactly one of A and not A belongs to I. 

A dynamic logic program over a language £ is a finite sequence Pi ® . . . © P n 
(also denoted ©P;, where the PjS are GLPs indexed by 1 , . . . ,n), where all the 
Pi s are defined over C. Intuitively such a sequence may be viewed as the result 
of, starting with program Pi, updating it with program P2, . . ., and updating it 
with program P„. For this reason we call the singles PiS updates. We use p (P) 
to denote the multiset of all rules appearing in the programs Pi, ...,P S . 

The refined stable model semantics for DyLPs is defined in [ 1 ] by assigning 
to each DyLP a set of stable models The basic idea of the semantics is that, if 
a later rule r has a true body, then former rules in conflict with r should be 
rejected . Moreover, any atom A for which there is no rule with true body in 
any update, is considered false by default. The semantics is then defined by a 
fixpoint equation that, given an interpretation I, tests whether / has exactly 
the consequences obtained after removing from the multiset p (P) all the rules 
rejected given /, and imposing all the default assumptions given I. Formally, let: 

Default((BPi, -0 = {not A \ J3 A <— body £ p (P) A body C /} 

Rej s (®Pi, I) = {t I t £ Pj| 3 77 £ Pj i < j, t to 77 A B(p) Cl} 

where r 1x1 77 means that r and 77 are conflicting rules, i.e. the head of r is the 
default complement of the head of 77. 

Definition 1 . Let ®P,; be a DyLP over language C and M a two valued inter- 
pretation. M is a refined stable model of ©Pj iff M is a fixpoint of r^,p i •' 

r^ P . (M) = least ( p (P) \ Pej s (ffiPj, M) U De fault}® Pi, M)) 

where least(P) denotes the least Herbrand model of the definite program obtained 
by considering each negative literal not A in P as a new atom 1 . 

The definition of dynamic stable models of DyLPs [ 2 ] is as the one above, but 
where the i < j in the rejection operator is replaced by i < j. I.e., if we denote 
this other rejection operator by Pej(©Pi,/), and define P0p i (/) by replacing 
in r s Rej s by Rej , then the stable models of ©P,; are the interpretations I 



1 Whenever clear from the context, hereafter we omit the ©P, in any of the above 
defined operators. 
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such that I = r 0Pi (/). Comparisons among these two definitions, as well as 
further details, properties and motivation for the definition of this language and 
semantics are beyond the scope of this paper, and can be found in [ 1 , 2 ]. 



3 The Notion of Causal Rejection for 3- Valued Semantics 

According to the above mentioned causal rejection principle [9,14], a rule from 
an older program in a sequence is kept (by inertia) unless it is rejected by a more 
recent conflicting rule whose body is true. On the basis of this, the very basic 
notion of model has to be modified when dealing with updates. In the static 
case, a model of a program is an interpretation that satisfies all the rules of the 
program, where a rule is satisfied if its head is true or its body is false. If we 
want to adapt this idea to the updates setting taking in consideration the casual 
rejection principle we should only require non rejected rules to be satisfied. Also 
the concept of supported model [4] has to be revisited when dealing with updates. 
In the static case, a model M of P is supported iff for every atom A £ M, 
there is a rule in P whose head is A and whose body is satisfied in M. If we 
extend the concept of supportedness to logic programs with updates, it would 
be unnatural to allow rejected rules to support a the truth of a literal. 

The causal rejection principle is defined for 2-valued semantics; we want now 
to extend it to a 3-valued setting, in which literals can be undefined , besides 
being true or false. In the 2- valued setting, a rule is rejected iff there is a rule 
in a later update whose body is true in the considered interpretation. In this 
context, this is the same as saying that the body of the rejecting rule is not 
false. In a 3- valued setting this is no longer the case, and the following question 
arises: should we reject rules on the basis of rejecting rules whose body is true, or 
on the basis of rules whose body is not false ? We argue that the correct answer 
is the latter. In the remainder we give both practical and theoretical reasons for 
our choice, but we want now to give an intuitive justification. Suppose initially 
we believe a given literal L is true. Later on we get the information that L is false 
if some conditions hold, but those conditions are (for now) undefined. As usual 
in updates, we prefer later information to the previous one. On the basis of such 
information, can we be sure that L remains true? It seems to us we cannot. The 
more recent source of information says if some conditions hold then L is false, 
and such conditions may hold. We should then reject the previous information 
and consider, on the basis of the most recent one, that L is undefined. 

On the basis of these intuitions, we extend the definition of update model 
and update supported model to the 3- valued setting. 

Definition 2. Let ® Pi be any DyLP, and M a 3-valued interpretation. M is an 
update 3- valued model of (BPi iff for each rule r in any given Pi, M satisfies r 
(i.e. hd(r) £ M or B(r ) % M) or there exists a rule 77 in Pj, i < j such that 
t cxi ry and B(rj) is not false in M. We say M is a supported 3-valued update 
model of (BPi iff it is an update 3-valued model and 
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1. for each atom A £ M , 3 r £ Pj with head A such that B{t) C M and 
?7 £ Pj, i < j such that r cxi 77, and B(r f) is not false in M . 

2. for each negative literal not A, if not A £ M, then for each rule 

A <— body £ p(V) such that body is true in M, there exists a rule rj, in a 
later update whose head is not A, and such that B(if) is true in M. 

We illustrate, via an example, the intuitive meaning of the defined concepts. 

Example 1. Sara, Cristina and Bob, are deciding what they will do on Saturday. 
Sara decides she is going to a museum, Cristina wants to go shopping and Bob 
decides to go fishing in case Sara goes to the museum. Later on they update 
their plans: Cristina decides not to go shopping, Sara decides she will not go to 
the museum if it snows and Bob decides he will also go fishing if it is a sunny 
day. Moreover we know from the forecast that Saturday can be either a sunny 
day or a raining day. We represent the situation with the DyLP Pi ©P2, where: 

Pi : museum(s) . P2 : fishfb) <— sunny. sunny <— notrain. 

shopping(c). not shopping(c). rain <— not sunny . 

fish(b) <— museum(s). notmuseum(s) <— snow. 

The intended meaning of Pi ©P 2 is that it does not snow on Saturday, but we 
do not know if it does rain or not, we know Sara goes to the museum on Saturday, 
Bob goes fishing and, finally, Cristina does not go shopping. In fact, every 3- 
valued update model of Pl©P 2 contains {museum(s ) , not shopping(c), f ish(b)}. 
Suppose now Bob decides that, in the end, he does not want to go fishing if it 
rains, i.e our knowledge is updated with: P 3 : not fish(b) rain. Intuitively, 

after P 3 , we do not know whether Bob will go fishing since we do not know 
whether Saturday is a rainy day. According to definition 2, there is a supported 
3- valued update model of Pi©P 2 ©P 3 in which shopping(c) is false, museum(s) 
is true and f ish(b) is undefined. 

It can be checked that, according to all existing stable models based semantics 
for updates of [1, 2, 5, 9, 14, 15], Pi©P 2 ©P 3 has two stable models: one where rain 
is true and fish(b) is false, and another where rain is false and fishfb) is true. 
A notable property of the well founded model in the static case is that of being 
a subset of all stable models. If one wants to preserve this property in DyLPs, in 
the well founded model of this example one should neither conclude fish(b) nor 
notfish(b). If a rule would only be rejected in case there is a conflicting later 
one with true body (rather than not false as we advocate), since the body of 
not fish(b) <— rain is not true, we would not be able to reject the initial rule 
fish(b) <— museum(s), and hence would conclude fish{b). Hence, to preserve 
this relation to stable models based semantics, the well founded model semantics 
for DyLPs must rely on this notion of 3-valued rejection described above. 

4 The Well Founded Semantics for DyLPs 

On the basis of the notion of causal rejection just presented, we define the Well 
Founded Semantics for DyLPs. Formally, our definition is made in a way similar 
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to the the definition of the well founded semantics for normal LPs in [ 10 ], where 
the well founded model is characterized by the least alternating fixpoint of the 
Gelfond-Lifschitz operator r (i.e. by the least fixpoint of U 2 ). Unfortunately, 
if we apply literally this idea, i.e. define the well founded model as the least 
alternating fixpoint of the operator used for the dynamic stable (or refined stable) 
models of DyLPs, the resulting semantics turns out to be too skeptical: 

Example 2. It is either day or night (but not both). Moreover, if the stars are vis- 
ible it is possible to make astronomical observations. This knowledge is updated 
with the information that: if it is night the stars are visible; the observatory is 
closed if it is not possible to make observations; and the stars are not visible: 

Pi : observe 4— seestars. day 4— not night. night <— not day. 

P2 : seestars 4— night. not seestars. closed(obs) 4— not observe. 

The intended meaning of Pi © P2 is that currently the stars are not visible, 
it is not possible to make astronomical observations and, hence, the observatory 
is closed. However, it is easy to check, the least alternating fixpoint of Pp!©p 2 
is {not, seestars}, in which one is not able to conclude that the observatory 
is closed. This is, in our opinion, not satisfactory: since we conclude that we 
cannot see the stars, we should also conclude that we cannot make astronomical 
observations and that the observatory is closed. Notably, the least alternating 
fixpoint of r s yield even more skeptical results. In fact, this is a general result 
which is an immediate consequence of Lemma 1 below. 

In order to overcome this problem, we define it as the least fixpoint of the 
composition of two different (antimonotonous) operators. Such operators have 
to deal with the causal rejection principle described above, in which a rule is 
to be rejected in case there is a later conflicting one whose body is not false. 
In the well founded semantics of normal logic programs, if there exists a rule 
A 4— body (where A is an atom), such that body is not false in the well founded 
model, then A is not false as well. Consider now the same situation in an update 
setting with a rule L <— bodyi , where bodyi is not false. In this situation we 
should conclude that L is not false unless there exists a rule notL 4— body2, 
where body2 is true in the same or in a later program in the sequence. In fact, 
note that the rule for not.L is not rejected by the one for L. Since the body of 
the former is true, according to the causal rejection principle not L should be 
true (i.e. L should be false) unless the rule is rejected by some later rule. In any 
case, L <— bodyi is no longer playing any role in determining the truth value of 
L. For this reason we allow rules to reject other rules in previous or in the same 
update while determining the set of non-false literals of the well founded model 
and, accordingly, we use Pjp. , as the first operator of our composition. 

For determining the set of true literals according to the causal rejection prin- 
ciple, only the rules that are not rejected by conflicting rules in later updates 
should be put in place. For this reason we use E^p i as the second operator, the 
well founded model being thus defined as the least fix point of the rr s . 
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Definition 3. The well founded model WFDy{®Pi) of a DyLP ©P, is the (set 
inclusion) least fixpoint o/P®p 4 Pjp. . 

Since both P and P s are antimonoton ous (cf. [1,14]), PP S is monotonous, 
and so it always has a least fixpoint. In other words, WFDy is uniquely de- 
fined for every DyLP. Moreover WFDy{®Pi) can be obtained by (transfinitely) 
iterating PP S , starting from the empty interpretation. 

For the dynamic logic program Pi © P 2 of example 2, the well founded model 
is {not see.stars, not observe, closed(obs)}. So, in this example, WFDy yield 
the desired less skeptical conclusions. In fact, in general WFDy is less skeptical 
than any semantics resulting from any other combination of T s and P. 

Lemma 1. Let ©Pi be a DyLP, and let X, Y be two interpretations such that 
X C Y. Then r s (Y) C r(X). 

From this Lemma it follows that the least fixpoint of any other combination 
is a pre-fixpoint of PP S and, as such, a subset of the least fixpoint of PP S . 

Example 3. Let Pi, P 2 and P 3 be the programs of example 1. As desired, the 
well founded model of Pi©P 2 ©P 3 is, {not snow, museum(s) , not shopping (c)} . 

As shown in example 1, W F Dy{Pi® P 2 © P 3 ) is a supported 3- valued update 
model. This result holds in general, whenever the well founded model does not 
contain any pair of complementary literals. 

Theorem 1. Let ©Pj be a DyLP and W its well founded model. Then, if W 
contains no pair of complementary literals, W is a supported 3-valued update 
model of ©Pi . 

The proviso of W not containing any pair of complementary literals is due 
to the fact that, since notion of interpretation we use allows contradictory sets 
of literals, the well founded model of a DyLP can be contradictory. We say that 
a DyLP ©P, ; is consistent (or non contradictory) iff WFDy(®Pi) is consistent 
i.e. it does not contain any pair of complementary literals. 

5 Properties 

Our motivation for defining a new semantics for logic programs updates, as 
described in the Introduction, is based on a number of requirements. Hence, we 
briefly examine in what term those requirements are indeed met by W FDy, and 
briefly comparing with existing approaches. 

One of the important requirements is that of having a semantics computable 
in polynomial time. It is not difficult to check that the computation of both 
and P® p . (!) is polynomial in the size of DyLP, and so: 

Proposition 1. The well founded model of any finite ground dynamic logic pro- 
gram ©Pi is computable in polynomial time on the number of rules in ©Pi. 
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Another required property is that of relevance [8], so as to guarantee the pos- 
sibility of defining query driven proof procedures. Informally, in normal (single) 
programs, a semantics complies with relevance if the truth value of any atom A 
in a program only depends on the rules relevant for this literal (i.e. those rules 
with head A , or with a head A' such that A' belongs to the body of (another) 
relevant rule). In order to establish results regarding relevance of WFDy we 
have first to define what is the relevant part of a DyLP (rather than a single 
program) wrt a literal (rather than atom) . 

Definition 4. Let ©Pj be any DyLP in the language C and L, B, C literals in 
C. We say L directly depends on B iff B occurs in the body of some rule in ©Pj 
with head L or notL. We say L depends on B iff L directly depends on B or 
there is some C such that L directly depends on C and C depends on B. We call 
Pe^(ffiPj) the dynamic logic programs P[ L ^ © ... © P^ such that P ^ is the 
set of all rules of Pi with head L or not L or some B on which L depends on. 

This definition simply applies the above intuition of relevance considering 
sequences of programs, and by stating that rules for not A are relevant for A 
(and vice-versa). And WFDy complies with relevance exactly in these terms: 

Theorem 2. Let ©Pj be a DyLP in the language C and A any atom of C. Then 
WFDy[®Pi) n {A, not A} = WFDy(RelA®Pi) fl {A, not A}. 

As noted above, WFDy can be contradictory. In these cases, inconsistent 
conclusions for a given atom may follow, but without necessarily having a con- 
tradiction in all atoms. However, as desired in updates, these contradictions in 
atoms may only appear in case there are two conflicting simultaneous rules (i.e. 
in a same program of the sequence) which are both supported, and none of them 
is rejected by some later update: 

Theorem 3. The well founded model W of a sequence ©P,; is noncontradictory 
iff f or T i r l G Pi such that: r to 77, W \= B ( t ), W |= B(rf)) there exists 
7 € Pj, i < j such that 7 to r or 7 to 77 and r s (W) |= B( 7). 

Finally, it was our goal to find a proper generalization of the well founded 
semantics single programs into logic programs updates. It is thus important, to 
guarantee that WFDy coincides with the well founded semantics of GLPs [6] 
when the considered DyLP is a single program P, and with the well founded 
semantics of normal programs [11] when, furthermore, that single program has 
no negation in rule heads. Denoting by WFG(P) the well founded semantics of 
the generalized logic program P according to [6]: 

Theorem 4. Let P be a generalized program. Then WFG(P) = WFDy(P). 

Since WFG(P) coincides with the well founded semantics of [ 11 ] when P is 
a normal program (cf. [6]), it follows that WFDy coincides with the semantics 
of [11] whenever the sequences is made of a single normal logic program. 
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5.1 Brief Comparisons 

Among the various semantics defined for sequences of LPs [1,2, 5, 9, 14, 15, 18], 
WFDy shares a close relationship with the refined stable models semantics of [ 1 ], 
resembling that between stable and well founded semantics of normal programs. 

Proposition 2. Let M be any refined stable model o/®P,;. The well founded 
model WFDy(®Pi ) is a subset of M. Moreover, ifWFDy{®Pf) is a 2-valued 
interpretation, it coincides with the unique refined stable model of ® Pi. 

This property does not hold if, instead of the refined semantics, we consider 
any of the other semantics based on causal rejection [2,5,9, 14, 15]. This is so 
because these semantics are sometimes overly skeptical, in the sense that admit 
to many models, and thus have a smaller set of conclusions (in the intersection 
of all stable models) . One particular case when this happens is when a sequence 
is updated with tautologies. Though, intuitively, updating our knowledge with 
a tautology should have no influence on the results, this is not the case in any of 
those semantics. For example, with all the cited semantics, updating the program 
P\ ® P 2 of example 2 (which, according to all of them, has a single stable model 
containing not see.stars) with the program P 3 : see.stars <— seestars leads 
to two stable models: one in which seestars is true and the other in which 
seestars is false. Thus, this intuitively harmless update prevents not seestars 
from being concluded. This is not the case with [1], nor with WFDy , which are 
both immune to tautologies. In this example, both conclude that not seestars 
is true, before and after the update. For further details on this topic see [1], 

Notably, all the attempts found in the literature [2, 3, 13] to define a well 
founded semantics for logic programs updates are overly skeptical as well. Ac- 
cording to all the cited semantics, the well founded model of Pi ® P 2 ® P 3 is the 
empty set, hence, unlike WFDy , they are not able to conclude that not seestars 
is true. Though there it cannot be detailed here, other class of programs exist, 
besides the ones with tautologies, where the cited semantics bring more skeptical 
results then WFDy. Moreover the definition of all these semantics is based on 
a complex syntactical transformation of sequences into single programs, making 
it difficult to grasp what the declarative meaning of a sequence is. 

6 Concluding Remarks 

Guided by the needs of applications, it was our purpose in this paper to define a 
semantics for DyLPs fulfilling some specific requirements. Namely: a semantics 
whose computation has polynomial complexity; that is able to deal with con- 
tradictory programs, assigning them a non trivial meaning; that can be used to 
compute answers to queries without always visiting the whole knowledge base. 
With this in mind, we have defined a well founded semantics for DyLPs. 

The defined semantics is a generalization of the well founded semantics of 
normal and generalized logic programs, and it coincides with them when the 
DyLP consists of a single program. It has polynomial complexity and obeys rel- 
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evance. Regarding the requirement of being able to deal with contradictions, lack 
of space prevented us from further elaborating on the properties of the semantics 
in these cases, and on how it can in general be used to detect contradictory lit- 
erals and literals that depend on contradictions. We have, nonetheless, provided 
a complete characterization of the non contradictory cases. 

We briefly compared the proposed semantic to the existent ones for DyLPs 
that are based on the causal rejection principle, and shown that it is a skeptical 
approximation of the refined stable model semantics for DyLPs, and less skeptical 
than all other existing well founded based semantics for updates. Comparisons 
to semantics of updates that are not based on the causal rejection principle 
are outside the scope of this paper. For an analysis of these semantics, and 
comparisons to the above ones see e.g. [14]. 
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Abstract. An important problem for knowledge representation is the 
specification of the behavior of information systems. To solve this prob- 
lem, different formal techniques have been used and many authors have 
endorsed the use of different sorts of logics. 

This problem is even more important in software engineering, where 
the main modeling languages are defined with no formal semantics. For 
example UML 2.0 has been proposed using a textual semantics. 

This work has a twofold objective: Firstly, we introduce a novel modal 
temporal logic called LNi 1 , which is the natural extension to first or- 
der of LNint-e, presented in [17,7]. Secondly, we show the usefulness 
of our logic for solving an important and specific knowledge representa- 
tion problem: Providing UML with a formal semantics (focusing on state 
machines). This way we want both to avoid the disadvantages of current 
textual UML semantics and to provide a formal basis for further verifi- 
cation and validation of UML models. Our new logic LNi 1 overcomes 
the two main limitations of LNint-e in the formalization of UML state 
machines: the use of parameters in the operations and the specification 
of the communications between objects. 

Keywords: Interval temporal logic, knowledge representation, formal 
semantics, intelligent information systems, state machines. 



1 Introduction 

In the field of software engineering the need for a formalization to achieve precise 
unambiguous specifications is widely accepted. These specifications conform the 
previous and necessary step for formal property verification. 

This formalization can be achieved using several techniques (logic, algebraic 
specifications, etc.). Chomicki and Saake [3] make a stand for logic in the fol- 
lowing statement: “ Logic has simple, unambiguous syntax and semantics. It is 
thus ideally suited to the task of specifying information system s”. 

Temporal logic is a natural candidate for the formalization of dynamic be- 
havior of software systems. This is supported by its use by well known authors 
as Pnueli [14] or Lamport [13] as for its recent use in CASE tools that include 
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formal methods in a software developing process [19, 12]. In [3] the appropriate- 
ness of temporal logic for computing is commented. “ although temporal logic has 
been studied by logicians for a long time, its use in this area is new and leads to 
many interesting research problems" . 

The main goal of this work is to deepen the use of temporal logic in knowledge 
representation, specifically, in this work we make two contributions: 

— The first one is the development of a novel temporal first order logic of points 
and intervals. This modal logic, that we call LNi 1 , is the extension of the 
propositional logic LNint-e [5,17]. LNi 1 is a many-sorted logic and all its 
domains are finite or countable. 

LNi 1 is characterized for combining expressions of points and intervals 
and the relative and absolute temporal approaches. We must remark the 
use in LNi 1 of temporal connectives based in precedence and posteriority 
concepts and its topological semantics [2]. 

— The second one proves the real applicability of our logic to a specific knowl- 
edge representation problem: The modeling of software system’s behavior. In 
that respect we have focused on UML, the standard modeling language nowa- 
days. More specifically, on their main behavioral model: state machines. 1 

It is important to remark that we developed a previous formalization of 
state machines using the propositional logic LNint-e [17,7]. The first order 
logic developed in this work constitutes an important advance, as it allows 
the formalization of parameters in the actions, as well as the specification of 
the communications between objects. Both are basic aspects in the behavior 
of software systems that could not be dealt with in previous formalizations. 

In the literature it is possible to find several different formalizations of the 
state machines based in different techniques: graphs, automata, logic, Petri 
nets, model checking. A thorough study of these different approaches can be 
found in [17,20]. 

This work is structured as follows: In the next section we present the logic 
LNi 1 and we detail its syntax and semantics, and in section 3 we show the main 
novelties in our state machines formalization. The conclusions and future works 
sections can be found at the end. 

2 LNi 1 Logic 

In this work we present the logic LNi 1 , a modal temporal logic that is a first 
order extension of the temporal propositional logic LNint-e, introduced in [17, 5]), 
which uses (Z, <) as the flow of time. 

2.1 Syntax of LNi 1 

Alphabet. Before starting the composition of the alphabet, it is convenient to 
show the main ideas that guide the construction of our logic: We have decided to 

1 State machines were already a consolidated model before UML appeared [11]. 
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combine points and intervals as temporal primitives. In addition, going further 
in the study of temporal ontologies (a detailed study can be found in [18]) we 
have reached the conclusion of the necessity of two kinds of interval expressions, 
namely : 

— Hereditary Expressions [18]: When affirmed at an interval, are also true at 
the points of that interval. In LNi 1 this kind of expressions share a repre- 
sentation with point expressions. 

— N on-hereditary Expressions [18]: When affirmed at an interval, are not true 
at the points of that interval. At this point we must remark that we dis- 
tinguish between non-hereditary types and non-hereditary executions that 
represent each one of the concrete occurrences of a non-hereditary type. Each 
one of these executions will be distinguished by means of a label. 

Before presenting the alphabet of LNi 1 , we would like to state that it is 
defined taking into consideration the domains that we use when applying it to 
software modeling. These domains refer to classes and objects (in the usual sense 
of object oriented paradigm) and labels for non-hereditary executions. According 
to this, the alphabet of LNi 1 includes the following symbols: 

— The set of classical connectives {->, V, A, — V, 3} and the set of binary tem- 
poral connectives of points {^, ^=}. 

— The symbols T and _L, to denote truth and falsity, respectively. 

— The sets C c = {c, ci, ..,c n , ..}, C a = {o, oi, ..,o n , ..} and C; = {1,1 1 , ..,l n , ..} of 
symbols to denote class, object and label constants, respectively. 

— The sets V c = {x,xi, ..,x n , ..}, V G = {y,yi, - ,y n , and V; = {z,z i, ,.,z n , ..} 

of symbols to denote class, object and label variables, respectively. 

— The sets T c = {/ c , f cl , f cn , . . .}, T 0 = {f Q , f ol , f on , . . .} and T t = 
{//, fn, ■ ■ ■ , fin, ■ ■ ■} of symbols to denote class, object and label functions, 
respectively. 

— The following sets of predicate symbols: V p = {P, Q , .., Pi, Qi , .., P n , Q n , . . .} 
to denote point predicates and hereditary interval predicates; V n h. = {o, 
/?,..., ai, Pi , . . ., a n , P n , . . .} to denote non-hereditary interval predicates; 
and a set Z = {m \ m £ Z} to denote dates, i. e. to name known instants. 2 

— The symbols T, |, [,],(, ) and 

The LNi 1 Language. At this point we present the LNi 1 language, firstly 
stating its terms and later its atoms and finally its well-formed formulas. 

Terms. 

1. The constants and variables are terms. Depending on the types considered 
in the alphabet we will set a difference between class terms, object terms 
and label terms. We use 7), T a and T c to denote the sets of label, object and 
class terms, respectively. We define T = 7) U T a U T c , the set of terms. 



Date predicates have arity 0. 



2 
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2. Given a symbol of function f £ Td, with d £ {l, o, c} of arity n and the terms 
t 1 ,...,t n £T, then f(t ± , . . . ,t n ) £ T d . 

Each n-ary function would have a signature belonging to S n , where S £ 
For example, given a binary object function f a (with signature 
% x T c ) and two terms t\£T Q and t 2 G T C1 then / 0 (ti,t 2 ) is an object term. 

3. There are no more terms than those built as stated in items 1 and 2. 

Atoms. We will set three types of atoms: 

1. Point or Hereditary Atoms : Are those of the form P(t\, . . . , t n ), where P is a 
symbol of point or hereditary n-ary predicate and ti, . . . ,t n £ T. We denote 
Q p as the set of point atoms. Each n-ary point or hereditary predicate will 
have a signature belonging to S n , where S £ 

2. Date Atoms : If to £ Z then to is a date atom. 

3. Atoms Derived from N on-hereditary Expressions: The following type of atoms 
implements some of the essential ideas in the construction of LNi 1 . As pre- 
viously stated, our logic has a great expressive power as it combines points 
and interval expressions. Nevertheless, we thought it appropriate to char- 
acterize interval expressions by means of expressions of points, as doing so, 
we obtain some advantages from the point of view of complexity: on the 
one hand, all the logic expressions become point expressions, and these use 
simpler connectives; on the other hand, we avoid the ambiguous concept of 
the current interval that the interval modal logics impose. 

This characterization is direct in hereditary interval expressions, as, by 
definition, they are ’’inherited” by the points that are part of the interval. 
Nevertheless, this is not the case with non-hereditary interval expressions. 
Thus, we decided to characterize these expressions by their start and end 
instants, and their course. 

On the other hand, we have to keep in mind that regarding non-hereditary 
expressions in LNi 1 we set apart non-hereditary types and non-hereditary 
executions. In LNi 1 the non-hereditary types are represented by means of 
non-hereditary predicates and the labels that represent the executions of 
this type are included as an extra argument in the predicate and is shown as 
a subindex. For example, let us consider a non-hereditary type ’’Calculate 
Total” and that it has two arguments. The representation in LNi 1 will be 
by means of a non-hereditary predicate CalcularTotal. If we want to represent 
a specific execution of that non-hereditary type to two objects 0 \ and o 2 and 
we assume that the label associated to that execution is l, we would write 
in LNi 1 CalcularTotal|(oi, o 2 ). 

All this argumentation can be summed up as follows: for each symbol of 
non-hereditary n-ary predicate a £ V n h , l G T and £ T, then 

three kinds of associated (point) atoms are defined: 

— | ai(t i, .., t n ) that represents the start instant of the execution a/(ti, .., t n ) 

— I cu(fi, .., t n ) that represents the end instant of the execution aj(ti, .., t n ) 

— otiiti, t n ) that represents an instant during the course of the execution 
04 (tli **i tn) 
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Every n-ary non-lrereditary predicate will have a signature belonging to S n , 
where S G {%,%,%}. 



Well-Formed Formulas ( wffs ): We already have all the necessary elements to 
present well-formed formulas included in the language of LNi 1 : 

1. Atoms, T and A are wffs 

2. If A and B are wffs , ->A, dV B, A A B, A =4 B and A^= B are wffs too 

3. If A is a wff and x a variable, Va :A and 3 xA are wffs too 

4. There are no more wffs than the ones formed according to these points. 

The readings of all these wffs are well-known except those of the temporal 
connectives of LNi 1 , A and ^=, that include the temporal relationships of prece- 
dence and posteriority: These reading will be completely detailed in the following 
section, devoted to the semantics of LNi 1 . 

2.2 Semantics 

LNi 1 is a many-sorted logic with the following domains: C for classes, O for 
objects and C for labels. The first two are finite, and the referent to labels will 
be countable. 3 We denote V = {C,0,C}. 

From these domains we can present the concept of interpretation for LNi 1 : 

Definition 1. A interpretation for LNi 1 is a tuple (0,0 , C, I), where C, O 
and C are non empty sets that represent domains of classes, objects and labels, 
respectively, and I is an application that associates: 

— To each class constant symbol c class an element /(c) G C. The interpretation 
of object and label constants is analogously constructed. 

— To each n-ary class function symbol f c an n-ary function over T> n , that is, 
I(f c ) : L) n — > C. The interpretation of object and label functions is analo- 
gously constructed. 

— To each symbol of n-ary point predicate P an n-ary relation over D n , that 

is, I(P) C V n 

— To each symbol of n-ary non-lrereditary predicate a an n-ary relation on 
C x V n , that is, 1(a) C C x V n 

Given an interpretation (C, O, C, I), the variables have the expected meaning, 
that is, they represent arbitrary elements of the corresponding domain. 

Before defining the concept of temporal interpretation we need to define some 
other previous concepts: 

— Let nhEx = {ai(t±, . . . , t n ) \ a G V n h, l G 7 ),t\, . . . ,t n G T} be the set of 
all non-hereditary executions. The terms t\, . . . ,t n must be of the adequate 
type to the signature of a. 



3 As a matter of fact, except in modeling of objects of permanent existence (that is, 
those that are carrying out actions for an infinite time), a finite domain suffices. 
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— We denote Int( Z) = \ ii,h G Z, ii < *2} as the set of closed finite 

intervals of Z with different start and end points. 

— Let { T tR (t\ , . . , tn) 5 tt/ (t 1 , , t n ) , J. Oil (f 1 , . . , tn) | € 'Pnhi l G 

ti,..,t n G T} be the set of points atoms derived from the non hereditary 
ones. 

— We denote h = Q v U Z U f^er UTU 1 

Definition 2. A temporal interpretation for LNi 1 is a pair of functions 
T1 = (H exec ,h), where: 

— H exec : nhEx — > IntffL) associates each non-hereditary execution cq(fi, ..,f n ) 
««t/i t/ie only interval where aj(ti, . . . ,t n ) holds. 

We must remark that the semantics of LNi 1 avoids the important re- 
striction of LNint-e that imposed that all the non-hereditary executions of 
the same non-hereditary class had to have the same duration [17]. 

— h: f2™ h — > 2 Z associates each atom of I7p h with a subset Z that satisfies the 
following conditions: 

1. h{- L) = 0 ; h( T) = Z and h(m) = {i}, for all to € Z 

2. For all ..,t n ) € EventExecs, if H exec (ai(ti, .. ,t n )) = [*i , * 2 ] » then: 

h{]ai{t\,..,t n )) = {*i}, h(ai(ti,..,t n )) = (*i,i 2 ) and .., t n )) = 

{*2}. 



Naturally, it is now necessary to extend the concept of temporal interpretation 
to any wff of LNi 1 . We will present this extension exclusively for the future 
fragment of LNi 1 (for the past fragment it is symmetrically extended). The 
extension is based in the key concept of our topological semantics, denoted rn [] A , 
that represents the following idea: If A is a wff of LNi 1 and t € Z, we define: 
mf A = min{t' G Z | t' > t and A is true at t'} 

In other words, m]~ A is the first instant after t in which A will be true. 
From these concepts we define the extension of a temporal interpretation EL = 
(H execi h), that actually only requires the extension of the function h to any wff . 
This extension becomes the usual form for boolean connectives. The extension 
for the future temporal connective considering that A, B are wff of LNi 1 , is 

h(A =4 B) = {t G Z | m[] A < +00 and m[] A < m[] B } 
where rn[] A is formally defined as follows: 

Definition 3. Let A be a wff of LNi 1 and t G Z, we define = min ((t, +oo)lT 

h(A)). 

We agree that min 0 = + 00 . 

Therefore, the meaning of the temporal connective is the following: 

A =4 B is read as sometime in the future A, and the next occurrence of A 
will be before or simultaneous to the next occurrence of B. 

The concepts of validity and satisfiability are defined in the usual manner. 
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2.3 Temporal Relations in LNi 1 

In this section we introduce some defined connectives needed in our application. 

Point Relations. The system {^, has fully expressive power regarding point 
expressions [2]. So, we may define 4 other well known temporal connectives: O 
(sometimes in the future), □ (always in the future), © (next) and 0 (prior). 

Here, we present new point connectives used to formalize UML state ma- 
chines: 

— A B is read as sometime in the future, A and B will occur and the 
next occurrences of A and B will be simultaneous. The definition in LNi 1 is 
A^+ B = A 4 B AB 4 A. 

— inst(A) is read as at the current instant A is true in the form of an isolated 
point. The definition is inst(A) = ©->A A A A -i © A. 

— Int + (A) is read as in the future A will occur, and the next time that A 
occurs, A holds at a finite closed interval. The definition is Int + (A) = (A « + 
(A A ©A A O-i A)) A (©A -> -.A). 

Interval Relations. The expressive power of LNi 1 regarding non-hereditary 
expressions is at least equal to that of Allen [1] and Halpern and Sholram [10]. 
Then, in our logic it is possible to define the standard temporal relations between 
non hereditary and/or hereditary expressions [16]. We present here three interval 
connectives needed to formalize UML state machines whose reading is the usual 
in [1, 10]: 

— ab + (ai, fii>) is the abutment relation between two event executions. The defi- 
nition in LNi 1 is 

ab + (ai,Pi /) = def O t aiA | ai «+f A' A (Vn,n' T ai =^t a n A | A' =U M 

— ab+ h _ h (ai, A) is the abutment relation between the non-hereditary execution 
ai and the hereditary expression A. Its definition is 

ab nh-h( a i’ A ) ~def Int + (A ) A I ai w+T A A (Vn | a* a n ) 

— | sec + (ai,Pi>,8i") represents the starting point of a sequence of three non- 
hereditary executions. 

T sec+(a;,A',7/") =de/t aiA | ai w+T PiA j A' « + t 7 1 " A (Vn,n' | A' =^T 

PnA T 7 1" =^T In') 



3 UML State Machines in LNi 1 

As we mentioned in the introduction, the main objective of this work is to illus- 
trate the usefulness of temporal logic in knowledge representation and, specifi- 
cally, in software engineering. 

The specific problem that we try to solve is the lack of formalization in 
UML which makes it difficult to verify the models. The authors of UML has 



4 In [2] we introduce these definitions formally. 
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approached this by introducing the OCL language, although it is only used to 
formalize well-formedness rules whereas the semantics is described textually. As 
a matter of fact, the formalization of UML semantics is one of the most active 
research tasks in software engineering, and specifically the formalization of the 
UML state machines. This is reflected in several works that have been analyzed 
in detail in [17]. 

In this section we show how LNi 1 improves previous works devoted to pro- 
viding a formal semantics to UML state machines 5 . Due to space limitations, 
we focussed only on the novelties of this work. The unfamiliar reader may get a 
deep description of UML in [15] and a full specification of UML state machines 
using the propositional temporal logic LNint-e in [17]. 

The main contribution of this work is the formalization of the use of parame- 
ters in the different operations of the state machine, which is an essential aspect 
in any software system and in particular in the communication among objects. 

State machines operations need a time-interval to be executed. So, they are 
represented by non-hereditary atoms, since we must represent complete execu- 
tions of operations. Now we present different kinds of operations: 

1. Actions are operations whose executions cannot be aborted. To specify an 
action act, with parameters oi, . . . ,o n , each one of them of the ci, . . . ,c n , 
class respectively, a predicate act will be included in the language. That 
predicate will have as many arguments as the state machine action has. 

If the argument corresponds to an object, we must also specify the class to 
which it belongs. Therefore, the representation of the action act(oi, . . . ,o n ) 
will be act|(oi : ci, . . . , o n : c n ). In the specification an ownership relation 
between each object and its class must be included. 

2. Activities can be aborted by an exiting transition and so the duration of 
each execution may be different. LNi 1 overcomes an important problem in 
our previous formalization based on LNint-e, in which it was required that 
all executions on an non-hereditary type must have the same duration and a 
connective, named ExecAbort , represented an activity being aborted. This 
mechanism is no longer necessary in LNi 1 . 

3. Send events are another option among the actions of a transition or state 
and they represent the communication between objects. Their formalization 
(impossible in LNint-e and in other works based on propositional logic) is 
another main contribution of LNi 1 . Send events are actions and they are 
also specified by non hereditary predicates. 

Generically, we consider a predicate send — ev to specify the sending of a 
call event ev. That predicate will have as many arguments as the event sent 
has and one more to represent the target object (and its class). Therefore, 
the representation of the sending of event ev(oi, . . . , o n ) to an object o of the 
c class will be send — evi(o : c; Oi : ci, . . . ,o„ : c n ), where l is an unused label, 
and ci, . . . , c n are the classes of oi, . . . , o n , respectively. 



5 We have to remark that our formalization respects the execution model of state 
machines proposed in UML, based in the run-to-completion assumption [15]. 
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In our approach, the start instant of the event corresponds with the instant 
the event is sent, and the final instant with the end of execution of the invoked 
operation of the target object. This characteristic allows us to specify the 
behavior of the synchronous communication between objects. As indicated 
in the UML semantics [15], the object that sends the event is blocked until 
the invoked operation is completed. 

The formalization of rest of the elements of the UML state machines is a 
natural extension of that proposed for LNint-e [17]. In the formalization of those 
elements the LNi 1 temporal connectives are intensively used. As an example, 
we show the formalization of the generic transition connective 6 that represents 
the situation depicted in the figure below. 



Transition(StateA, ActivA | 2 (o2 : c2), ExitActionA | 3 (o3 : c3), 
Event, Guard, send — eventSi(o : c; o4 : c4), 
States, EntryActionBi 5 (o5 : c5), ActivBi 6 (06 : c 6 ) ) 



□ ( StateA A Event A Guard 



© -^StateA 



—def 

A 



//if the transition is fired, the state ends 

|ActivAi 2 (o2 : c2) — + © J, ActivAi 2 (o2 : c2)j A 
//if the activity is in process, it is aborted 
© j'sec + (ExitActioni 3 (o3 : c3), 
send — eventS[(o : c; o4 : c4), EntryActionB | 5 (o5 : c5) ) A 
//exit, event, and entry actions are executed 

a bnh-h (EntryAction B i 5 (°5 : c5), States) A 
//target state is started 



ab + (EntryActionB| 5 (o5 : c5), ActivB| 6 (o 6 : c 6 ) ) 



// and the execution of the target state activity is started 



State. 



entry/enlryActionAlol :c1 ) 

do/aclivA(o2:c2) 

exit/exitActionA(o3:c3) 



event[guard]/ 

/l o.eventS(o4:c4) 



State B 

entry/entry ActlonB(o5:c5) 
do/activB(o6:c6) 
exi1/exitActionB(o7 :c7) 



4 Conclusions and Future Work 

In this work we present LNi 1 , a first-order temporal logic that combines points 
and intervals and the absolute and relative approaches. It is a many-sorted logic, 
whose domains are finite or countable. We have demonstrated the public useful- 
ness of LNi 1 for an open problem, as the one of providing a formal semantics 
to state machines, the main behavior model of UML, the standard modeling 
language for software systems. 

As future work, an important aspect in logic is the availability of reasoning 
methods. In that sense, our research group has achieved great advances in the 
development of automated theorem provers for temporal logics [6,4], that will 



Transitions are represented by means of connectives that relate all the elements that 
can participate in a transition, namely: events, guards and actions, as well as the 
source and target states. 
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help to develop reasoning methods for LNi 1 . In this line of thought, we have 
defined a normal form that is independent from the deduction method for LNint- 
e [8]. We will take into consideration some other recent works for first order 
temporal logics [9] . Our intention is to integrate deduction methods into a CASE 
tool of which we already have a prototype. It allows us to edit stateclrarts and 
to automatically generate the associated LNi 1 formulae. 
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Abstract. This paper investigates the optimization by fold/unfold of 
declarative programs that integrate the best features from both func- 
tional and logic programming. Transformation sequences are guided by 
a mixed strategy which, in three low-level transformation phases, success- 
fully combines two well-known heuristics -composition and tripling-, thus 
avoiding the construction of intermediate data structures and redundant 
sub-computations. In particular, whereas composition is able to produce 
a single function definition for some nested (composed) functions, the 
tupling method merges non-nested functions calls into a new function 
definition called eureka. We solve the non trivial problem of discover- 
ing the set of calls to be tripled in an incremental way, i.e. by chaining 
different eureka definitions where only non-nested calls sharing common 
variables are taken into account. Moreover, by appropriately combining 
both strategies, together with a simplification pre-process based on a 
kind of normalization, we automatically optimize a wide range of pro- 
grams (with nested and/or non-nested function calls) at a very low cost. 



1 Introduction 

Functional logic programming languages combine the operational methods and 
advantages of the most important declarative programming paradigms, namely 
functional and logic programming. The operational principle of such modern 
multi-paradigm declarative languages is usually based on narrowing. A narrow- 
ing step instantiates variables in an expression and applies a reduction step to 
a redex of the instantiated expression. Needed narrowing is the currently best 
narrowing strategy for functional logic programs due to its optimality properties 
w.r.t. the length of derivations and the number of computed solutions [3], and 
it can be efficiently implemented by pattern matching and unification. 

The fold/unfold transformation approach was first introduced in [5] to opti- 
mize functional programs and then used for logic programs [10]. A transformation 
methodology for lazy functional logic programs was presented in [2], where we 
also introduced an implemented prototype (the Synth tool) which has been suc- 
cessfully tested with several applications in the field of Artificial Intelligence. The 



* This work was partially supported by CICYT under grant TIC 2001-2705-C03-03. 
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also called “rules+strategies” approach is commonly based on the construction, 
by means of a strategy , of a sequence of equivalent programs each obtained from 
the preceding ones by using an elementary transformation rule. The essential 
rules are folding and unfolding , i.e., contraction and expansion of subexpressions 
of a program using the definitions of this program or of a preceding one [9]. 
Although they are difficult to automate, there exists a large class of program 
optimizations such as composition and tripling (which is able to obtain even 
super-linear speedups), that can be achieved by fold/unfold transformations. 

Composition essentially consists of the merging of nested function calls while 
the tripling strategy merges separate (non-nested) function calls with some com- 
mon (variables) arguments into a single call to a (possibly new) recursive function 
which returns a triple of the results of the separate calls, thus avoiding either 
multiple accesses to the same data structures or common subcomputations. In 
[1] and [8] we have investigated automatic strategies for performing composi- 
tion and tripling, respectively, in a multi-paradigm declarative (functional-logic) 
setting. Moreover, as we show in this paper, by simplifying (normalizing) rules 
before applying composition and/or tripling, we can increase the efficiency and 
the applicability scope of the transformations. Since both strategies can be seen 
as antagonic in some senses, an important goal of the present work is to build 
a mixed heuristic that exploits the complementary effects of both methods in 
order to achieve better results (i.e., more general and powerful optimizations). 

However, a weak point of these techniques is the achievement of the appropri- 
ate set of new ( eureka ) definitions which make it possible the optimizations to be 
pursued [5,10,9]. In contrast with prior non-automatic and rather complicated 
approaches (tripling has only been semi-automated to some -pure functional- ex- 
tent [6,9]), this work starts from a previous (more realistic and simpler) method 
presented in [8], where we described the internal structure of tupling. Moreover, 
in the present approach we highly refine that algorithm by naturally embedding 
into it some simplification pre-processess, incremental capabilities, and more ef- 
fective tests for termination. In particular, by performing recursive calls to the 
general tupling algorithm, we obtain more and more refined new eureka defini- 
tions until reaching the intended optimization, when possible. 

The structure of the paper is as follows. After recalling some basic definitions, 
we introduce the basic transformation rules and illustrate its use in Section 2. 
The next section describes the different transformation phases that constitute 
the core of our composition and tupling algorithms. Section 4 proposes some 
simplification procedures based on normalization and composition, and shows 
that the three low-level transformation phases of tupling can be chained in an 
incremental way in order to reinforce its respective strengths. Finally, Section 5 
concludes. More details can be found in [7]. 



2 Fold/Unfold Transformation Sequences 

In this work we consider a signature £ partitioned into a set C of constructors 
and a set T of defined functions. The set of constructor terms (with variables) is 
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obtained by using symbols from C (and a set of variables X). The set of variables 
occurring in a term t is denoted by Var(t). We write oif for the list of objects 
oi, . . . , o n . A pattern is a term of the form f(d n ) where f /n £ T and d \, . . . , d n 
are constructor terms (with variables). A term is linear if it does not contain 
multiple occurrences of one variable. t\ p denotes the subterm of t at a given 
position p , and t[s] p denotes the result of replacing the subterm t\ p by the term 
s. We denote by {x\ i— > t \ , . . . , x n i— > t n } the substitution a with cr(xi) = U for 
i = 1 , . . . , n (with Xi ^ Xj if i ^ j), and cr(x) = x for all other variables x. The 
application of a substitution a to an expression e is denoted by <r(e). 

A set of rewrite rules l — > r such that l ^ X , and Var(r) C Var(l) is called 
a term rewriting system (TRS). The terms l and r are called the left-hand side 
(lhs) and the right-hand side (rhs) of the rule, respectively. A TRS TZ is left-linear 
if l is linear for all Z — > r £ TZ. A TRS is constructor-based (CB) if each left-hand 
side is a pattern. In the remainder of this paper, a functional logic program is 
a left-linear CB-TRS without overlapping rules (i.e. the lhs’s of two different 
program rules do not unify). A rewrite step is an application of a rewrite rule to 
a term, i.e., t -* p ,r s if there exists a position p in t, a rewrite rule R= (l —* r) 
and a substitution a with t\ p = o(l) and s = t[a(r)\ p . The operational seman- 
tics of modern integrated languages is usually based on (needed) narrowing , a 
combination of variable instantiation and reduction. Formally, s ^p.R.a t is a 
narrowing step if p is a non-variable position in s and a (s) — > Pi r t. We denote by 
to t n a sequence of narrowing steps to "~> ai ■ ■ ■ ' N '> C r n t n with o = a n o • • • o ay . 

We present now our program transformation method based on the fold/unfold 
methodology. First, we recall from [2] the set of (strong) correct/complete trans- 
formation rules that constitute the core of our transformation system. Basically, 
we start with an initial declarative program, TZo, and construct a transformation 
sequence TZo, ■ ■ ■ , TZ n , n > 0, by applying the following transformation rules: 

Definition Introduction: we may get program 7Zk+i by adding to 7 Zk a new 
rule (called “definition rule” or “eureka”) of the form f(x rn ) — > r, where 
Var(r) = {xZH} and / is a new function symbol not occurring in TZo , • • • , 7Zk- 
Unfolding: let (l — > r) £ IZk then, we may get program 7^fc+i as follows: 
TZk+i = TZk \ {l — > r} U {cr (l) — » r' \ r r' in 7 Zk}- We call normalizing 
step to a rewriting-based unfolding step (thus implying that a(l ) = l ). 
Folding: let (l — > r) £ TZk be a non eureka rule) (V —> r') £ TZj, 0 < j < k, an 
eureka rule, and p a position of r such that r\ p = cr(r / ); then, we may get 
program TZk+i as follows: TZk+i = (7 Zk \ {l — » r}) U {/ — > r[a{l')] p }. 

Example 1. In order to optimize a program, the three rules above should be 
applied according to some appropriate strategy as it is the case of composition 
[5] . The following original program defines typical operations for computing the 
length and the sum of the elements of a list, and for generating a (descending) 
list of consecutive natural numbers: 



1 An eureka (or definition rule) maintains its status only as long as it remains un- 
changed, i.e., once it is transformed it is not considered an eureka rule anymore. 
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(Ri) ■ len([ ]) -> 0 (R 2 ) : len([H|T]) — *■ s(len(T)) 

(R3) : sum([ ]) — > 0 (R 4 ) : sum([H|T]) — * H + sum(T) 

(R 5 ) : gen(O) -> [] (R 6 ) : gen(s(X)) -> [s(X)|gen(X)] 

If we want to add a new rule for obtaining the sum of the firsts X natu- 

ral numbers, a naive possibility can be the following one (R7) : sum_from(X) — » 
sum(gen(X)). We can optimize this definition by avoiding the generation and sub- 
sequent traversal of the intermediate list by applying the composition strategy 
as follows: 



1 . Definition introduction: 

2 . Unfold rule R$: 

(Rg) : newi(O) — > sum([ ]) 

3 . Normalize rules Rg and Rig: 

(R11) : newi(O) — > 0 

4 . Fold R\2 using R s : 

5 . And finally, fold R7 w.r.t R$: 



(Rs) ■ newi(X) — * sum(gen(X)) 

(i?i 0 ) : newi(s(X)) — > sum([s(X) |gen(X)]) 

(-R12) : newi(s(X)) — ► s(X) + sum(gen(X)) 

(R13) : new 1 (s(X)) — > s(X) + new^X) 

(-R14) : sum_from(X) — > newi(X) 



And now, the enhanced definition for sum_from corresponds to rule -R 14 
toghetlrer with the definition of the auxiliary symbol newi (rules i?n and R13). 

However, for the case of tripling we need to extend the core of our transfor- 
mation system with an abstraction rule that allows us to flatten nested function 
calls or to pack them into a tuple as follows 2 : 

Abstraction: let R = (l — > r) £ IZk be a rule and let Pj be sequences of 
disjoint positions in r such that r\ p = e* for all p in P i: i = 1 , . . . ,j, i.e. , 
r = r[e]\p 7; then, we may get program IZk+i as follows: Rk+i = (R-k \ R) U 
{l — > r\zj]-p~ where (z\,...,Zj) = (ei, . . . , e j ) } , such that ~Zj are fresh 
variables, and local declarations are expressed with the typical where con- 
traction of functional programming [ 5 , 4 , 9 ]. 

Example 2. In order to illustrate the use of abstraction when optimizing a pro- 
gram, consider the following rule defining the average of the elements of a given 
list (-R15) : average(L) — » sum(L)/len(L). Observe that this definition traverses 
the same list twice, which can be avoided by applying tripling as follows: 



1 . Definition introduction: (-R16) : uew 2 (L) — » (sum(L), len(L)) 

2 . Unfold rule i?i 6 : 

(R 17 ) : new 2 ([]) — ► (0, len([])) (R ls ) : new 2 ([H|T]) — » (H + sum(T), len([H|T])) 

3 . Normalize rules -R17 and Ris : 

(Rig) ■ new 2 ([]) -1 (0, 0) (i?. 20 ) : new 2 ([H|T]) -r (H + sum(T), s(len(T))) 

4 . Abstract i? 2 o : 

(i?2i) new 2 ([H|T]) — 1 (H + U, s(V)) where (U, V) = (sum(T), len(T)) 



2 For a sequence of positions P = p n , we let t[s n ]p = (((t[si] P i)[s2] P2 ) • • • [ s n] P „)- 
By abuse, we denote by t[s]p when si = ... = s„ = s, as well as 

((ftsijpj . . . MpJ by t[s^]p^. 
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5. Fold i?2i with Riq: (R 22 ) new 2 ([H|T]) — * (H + U, s(V)) where (U, V) = new 2 (T) 

6. Finally, after similar abstraction and folding steps on f?i5, we obtain the 
desired definition (R 23 ) '■ average(L) — ■> U/V where (U, V) = new 2 (L), that 
only traverses a list once thanks to the use of the improved definition for 
new 2 (rules -R19 and _R 2 2)- 

3 Structure of Transformation Strategies 

In this section we decompose the internal structure of the transformation strate- 
gies of composition and tupling. Following [1] and [8], we focus our attention 
separately in the three transformation stages (definition introduction, unfolding 
and folding) shown in Table 1, where each phase consists of several steps done 
with the transformation rules described before. 



Table 1. The Tupling_Algorithm 



INPUT : 


Initial Program IZ and Program Rule R = {l — > r) £ 1Z 




********** definition introduction phase ********** 




1. Let T = {t\, . . . ,t n ) be the set (without repetitions) 
of pattern subterms sharing common variables in r 

2. Apply the DEFINITION INTRODUCTION RULE to generate: 

Rdef = ( fnew{x ) — > T) 

******************* UNFOLDING PHASE ****************** 


BODY: 


3. Let 7 Zunf = {Rdef} be a program 

4. Repeat 72.„„/=NORMALIZE (UNFOLD CR. un f, 7^) ,IZ) 

until every rule R' £ IZunf verifies TEST (R 1 , Rdef )>0 

********************* FOLDING PHASE ****************** 




5 . Let 7 Zfold = {IZunf OR) be a program 

6. For every rule R' £ Rfold verifying TEST (7?, 7tb e /)=2 

IZfold = ( IZfold ~ {7?'}) U {FOLD (ABSTRACT^', Rdef))} 


OUTPUT: 


Transformed Program IZfold 



Definition Introduction Phase. The eureka generation phase, which is the 
key point for a transformation strategy to proceed [5,9,10,2]. For the case of 
the composition strategy, eureka definitions can be easily identified since they 
correspond to nested calls. However, for the tupling strategy the problem is much 
more difficult, mainly due to the fact that the (non- nested) calls to be tripled 
may be arbitrarily distributed into the rhs of a rule. Sophisticated static analyses 
techniques have been developed in the literature using dependency graphs ([6]), 
m-dags ([4]) and other intrincate structures. The main problems appearing in 
such approaches are that the analyses are not as general as wanted (they can fail 
even though the program admits tupling optimizations), and they are time and 
space consuming. In order to avoid these risks, our approach generates eureka 
definitions following a very simple strategy that obtains high levels of efficiency 
(see steps 1 and 2 in Table 1 and step 1 in Example 2). Since it is not usual that 
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Table 2. Function TEST 



INPUT: 


Original eureka 
Unfolded Rule 


fnew (x) — > T such that T = (tl, ... ,t„) 
cr(fnew(x)) — > T’ such that T’ = (ti, ■ ■ ■ ,t' n ) 




1 . Let S = (si, . . . 


s m ) be the set (without repetitions) 




of pattern subterms sharing common variables in T' 




2. Look if one of 


the following Stopping Conditions holds: 


BODY: 


SCI: 


m < n 


=> return 1 * base case definition * 




SC2: 


m = n and Q(T) = S =4> return 2 ***** foldability ***** 




SC3: 


m > n 


=> return 3 *** new tupling loop ** 






Otherwise: 


=>■ return 0 ** continue unfolding * 



terms to be tripled contain new function calls as parameters, we cope with this 
fact in our definition by requiring that only pattern subterms (sharing at least 
a common variable) of r be collected. On the contrary, the considered subterms 
would contain nested calls which should be more appropriately transformed by 
composition instead of tripling, as we will see in Section 4. 

Unfolding Phase. During this phase, the eureka definition Rdef generated in 
the previous phase, is unfolded possibly several times (at least once) rising the 
original definitions of program 1Z, and returning a new program TZ U nf which 
represents the unfolded definition of f new . In each iteration of this phase, once 
a rule in lZ un f is unfolded, it is removed and replaced with the rules obtained 
after unfolding it in the resulting program lZ un f, which is dynamically updated 
in our algorithm. The key point to stop the unfolding loop (see steps 3 and 4 in 
table 1, where unfolded rules are normalized as much as possible once obtained) 
is the use of function TEST described in table 2, which compares the tuples of 
patterns sharing variables of eurekas and unfolded rules in order to decide when 
a base case as been reached, a regularity has been found or a new tupling cycle 
must be (incrementally) re-initiated. 

Folding Phase. The aim of this phase (steps 5 and 6 of table 1) is not only to 
obtain efficient recursive definitions of the new symbol f new (initially defined by 
the eureka Rdef), but also to redefine old function symbols in terms of the opti- 
mized definition of f new . This fact depends on the rule R to be folded, which may 
belong to the unfolded program obtained in the previous phase (' R un f ), or to the 
original program (TV), respectively. The application of a folding step on a given 
rule is similar to rename a subterm in its rhs with an instance of f n ew(%)- For 
the case of the composition strategy, folded (renamed) subterms are obviously 
nested (composed) expressions. However, before performing folding steps when 
applying tupling, we first must to generate the tuple to be folded by means of 
abstraction steps as follows. Firstly we consider the case R £ R U nf • Remember 
that Rdef = ( fnew(x ) ->■ T) where T = (ti, . . .,t n ). If R = (<r(f new (x)) -> r') 
satisfies the foldability condition (SC2) explained in the previous phase then, 
it is possible to abstract R accordingly to tuple T and generate the new rule 
cr(fnew(x)) — > r '[zj]p- where (z\, . . . ,z n ) = 6((t \, . . . ,t n )), as illustrates step 
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4 and Rq in Example 2. After a subsequent folding step using Rd e f (see rule R 22 
and step 5 in Example 2), we obtain a recursive definition of f ne wi 38 desired. 
The case when R £ 1Z is perfectly analogous (see rule R 23 in the same example) . 

4 Incremental Tupling with Simplification Pre-process 

We propose now three final improvements which can be naturally embedded in 
our tupling algorithm as shown in table 3. The global algorithm submits all rules 
of a given program to the transformation process (step 2 in the table). Before 
being treated by tupling (step 2.3), each program rule R, is firstly simplified by 
normalization (generating rule R' in step 2.1) and composition (generating rule 
R" in step 2.2). Finally, step 2.4 introduces our notion of incremental tupling. 

Improvement Based on “Normalization+Tupling”. As described in Sec- 
tion 3, normalization is a powerful tool systematically used for simplifying rules 
during the unfolding phase. Moreover, its benefits also hold when it is applied 
before starting a (composition and) tupling loop, as the following example shows. 

Example 3. The following program defines function fact for computing the fac- 
torial of a given number, and fli and fli ev for generating lists with factorials 
of consecutive natural numbers (only even numbers are considered in fli ev ): 



(-R36) 


fact(O) - 


■* s (°) 


(-R37) 


fact(s(N)) - 


-> s(N) > 1 = fact(N) 


(R38,) 


fli(O) - 


-[] 


(R39) 


fli(s(N)) - 


-> [fact(s(N))|fli(N)] 


(Rio) 


f li ev (0) - 


-[] 


(Rai) 


f li ev (s(s(X))) - 


-> [fact(s(s(X)))|fli ev (X)] 


{Ra2) 


f li ev (s(0)) - 


-[] 









If we try to improve both definitions for fli and fli ev , our tupling algo- 
rithm would proceed by creating eurekas with tuples (f act(s(X)), f li(X)) and 
(f act(s(s(X))), f li ev (X)), respectively, and continuing with never ending un- 
folding loops, as the reader may easily check. In particular, the second case has 
been proposed in [6] as a non trivial example to be solved by tupling. How- 
ever, in contrast with the rather complicated solution (based on involved eu- 
reka analyses) presented there, we propose a much simpler way to solve the 



Table 3. Improved Algorithm: Normalization+Composition+Incremental Tupling 



INPUT : 


Original Program 1Z 


BODY: 


1. Initialize program 7Z' = 1Z 

2. For each rule R £ 1Z' do 

2.1. {I?/}=NORMALIZE({I?}, 7Z') 

2.2. 7?. , =Composition_Algorithm((7?. , — { R }) U {-fif}, R ‘ ') 

2.3. 7?/=Tupling_Algorithm(7 Z',R") 

2.4. For every rule R* £ 1Z' verifying TESTd?*, Rdef)=3 

TJ-^Tupling-AlgorithmCT?/, R*) 


OUTPUT: 


Transformed Program 1Z' 
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problem. The idea is to simply normalize rules -R39 and R41 before being trans- 
formed by tripling. So, after normalizing i? 39 we obtain the new rule (R43) : 
fli(s(X)) — > [s(X) * f act(X)|f li(X)]. Moreover a normalizing step on R 41 gener- 
ates (-R44) : f li ev (s(s(X))) — > [s(s(X)) * f act(s(X)) |f li ev (X)], which also admits 
a final normalizing step to become (-R4.5) : f li ev (s(s(X))) — > [s(s(X)) * s(X) * 
f act(X)|f li ev (X)]. Now, starting with this pair of fully normalized rules, we can 
generate the respective pair of appropriate eureka definitions (i? 46 ) : new 5 (X) — » 
(fact(X), f li(X)) and (R47) : newg v (x) (f act(X), f li ev (X)), which can be now 

easily treated by tupling. 

Improvement Based on “Composition+Tupling”. On the other hand, we 
consider now a second class of programs where the optimization to be pursued 
could not proceed by considering only patterns subterms (i.e., not containing 
nested calls) to be tripled, since, for those cases, the tuple generated by our 
method only would contain an unique element. Remember that the composition 
strategy optimizes the definition of nested expressions by generating a single 
new function definition. Hence, after applying the composition strategy as much 
as possible, new patterns (representing calls to enhanced eureka definition) may 
arise on transformed rules, and this is just we need before initiating tupling. 

Example 4 - Continuing with the examples of Section 2 , consider the following 
definition (R^s) '■ f(N) — » sum(gen(N))/len(gen(N)). Observe that this function 
computes the formula f(n) = (^O“_ 0 i)/n in a very inneficient way: a same list 
is created (two calls to gen) and traversed (one call to sum and one more to 
len) twice!. Unfortunately, our tupling strategy is not able to optimize the pre- 
vious definition, since it only considers the unique subterm gen(N) when building 
the initial eureka definition. However, we observe that the composition strategy 
can be applied twice to expressions sum(gen(N)) and len(gen(N)) on rule R^. 
For the first case we obtain the optimized definition of new! seen in Example 1 
and for the second one it is easy to build (by composition) the following new 
definition: (R49) : new 6 (0) — » s(0), (-R 50 ) : new 6 (s(N)) — » s(new 6 (N)). After this 
pre-process, the original definition for f becomes f(N) — ■> newi(N)/new 6 (N) which 
allows us to start a tupling loop in order to finally obtain the following (fully 
optimized, without the need of creating and traversing intermediate lists) defi- 
nition: f(N) — > U/V where (U, V) = new 7 (N), having that new 7 (0) — » (0,0) and 
new 7 (s(N)) — > (s(N) + U, s(V)) where (U, V) = new 7 (N). 

Improvement Based on “Tupling+Tupling”. Our incremental version of 
the tupling algorithm highly increases its power, without seriously altering its 
core, by simply generating, optimizing and chaining several eureka definitions. 

Example 5. The classical Hanoi’s Towers problem can be solved in a natural 
but inefficient way by using the pair of rules (-R 51 ) : h(0,A,B, C) — > [] and 
(R52) : h(s(N), A, B, C) — » app(h(N, A, C, B), [mov(A, B) |h(N, C, B, A)]) (together with 
rules defining the typical app function for concatenating lists). This program, 
with exponential complexity due to the two recursive calls in the rlrs of rule R52, 
can be optimized by tupling as follows: 
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1 . Definition Introduction (-R53) : new 8 (N, A, B, C) — » (h(N, A, C, B), h(N, C, B, A)) 

2 . During the unfolding phase, we generate the pair of rules: 

(-R54) : new 8 (0, A, B, C) -> ([],[]) SCI! 

(-R55) : new 8 (s(N), A,B, C) -> (app(h(N, A, B, C), [mov(A,C)|h(N,B,C, A)]), 

app(h(N, C, A, B), [mov(C, B)|h(N, A, B, C)])) SC3! 

At this point -R54 represents a case base definition for new 8 . Regarding R 55 , we 
observe that its rlis contains four calls (sharing common variables) to h, where 
only three of them are differents. So, if we force a premature application of 
abstraction+folding steps, the recursive definition of new 8 would have two calls: 
one for new 8 and one for h, which does not represent an enhanced definition for 
new 8 . On the other hand, it would be more appropriate to re-generate a new 
tupling process in order to optimize -R55, since now our eureka generator is able 
to produce an eureka with three calls (patterns sharing common variables), as 
desired. Hence, our incremental algorithm simply consists in generating a new 
tupling cycle when the stopping condition SC3 described in Table 2 is reached. 
In our example, the new tupling process will consider as initial eureka the new 
rule (i?56) : new 9 (N, A, B, C) — » (h(N, A, B, C), h(N, B, C, A), h(N, C, A, B)). Moreover, 
the unfolding phase will generate new pair of rules: 

(R57) : new 9 (0, A, B, C) — > ([],[],[]) SCI! 

(i? 58 ) : new 9 (s(N), A, B, C) -> (app(h(N, A, C, B), [mov(A, B)|h(N, C, B, A)]), 

app(h(N, B, A, C), [mov(B, C)|h(N, A, C, B)]), 
app(h(N, C, B, A), [mov(C, A)jh(N, B, A, C)j)) SC2! 

At this moment, the unfolding process must finish since -R57 represents a base 
case definition for new 9 and R^s satisfies the stopping condition (foldability) SC2 
in Table 2 ). Now, we proceed with the folding phase obtaining: 

(-R59) : new 9 (s(N), A,B,C) -> (app(Z 1; [mov(A, B) |Z 2 ]), app(Z 3 , [mov(B, C)|Zi]), 

app(Z 2 , [mov(C, A) | Z 3 ] )) 
where (Zi, Z 2 , Z 3 ) = new 9 (N, A, C, B) 

(i? 60 ) : new 8 (s(N), A,B,C) -> (app(Zi, [mov(A, C)|Z 2 ]), app(Z 3 , [mov(C, B)|Zi])> 

where (Zi, Z 2 , Z 3 ) = new 9 (N, A, B, C) 

Thanks to this re-application of the whole tupling algorithm, we have got 
improved definitions for both eurekas. Then, we can recover the first tupling loop 
at the point we left it, in order to reuse the enhanced definition of new 8 when re- 
defining h (by appropriately abstracting and folding i? 52 ) as: h(s(N), A, B, C) — > 
app(Zi,mov(A, B) : Z 2 ) where (Zi,Z 2 ) = new 8 (N, A, B, C). Now, this last rule, 
together with i? 51 , R54, R57, R59 and Rqq, represents an improvement w.r.t. the 
original program thanks to the chained use of enhanced definitions for the new 
symbols new 8 and new 9 . 

To finish this section, we remark that the massive use of function TEST to- 
gether with the extensive exploration of patterns sharing common variables, are 
the more significant and distinctive points of our improved approach to auto- 
matic tupling. Similarly to other heuristics shown in the literature, we assume 
that the algorithm aborts when it overflows a maximal number of allowed un- 
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folding and tripling iterations (i.e., when no enough stopping conditions SCI 
-base cases- and SC2 -foldable regularities- have been found). 



5 Conclusions 

Tripling is a powerful optimization strategy which can be achieved by fold /unfold 
transformations and produces better gains in efficiency than other simpler ones 
such as composition. As it is well-known in the literature, tripling is very compli- 
cated and many automatic tripling algorithms either result in high runtime cost 
or they succeed only for a restricted class of programs. Starting with frilly auto- 
matic composition and tripling algorithms that we have previously designed in 
recent works, our approach refines them and removes some of these limitations. 

It is important to remark that the idea of considering only patterns sharing 
common variables has very nice properties: it is a very easy, purely syntactic 
method, that can be efficiently repeated along the three transformation phases, 
i.e., not only for generating eureka definitions but also during the unfolding phase 
while searching for regularities and finally, for the application of abstraction and 
folding steps. Moreover, it is also useful for testing when a new tripling cycle 
must be initiated, hence producing chained definitions of eureka symbols. 

We have also shown how to increase both the speed-up of the process and 
the class of programs to be optimized by simply performing some simplification 
pre-processes on the original program before applying tripling. In particular, 
apart that normalization is useful during the unfolding phase, its use also gen- 
erates programs more amenable to higher optimizations. Similarly, composition 
is not only interesting for producing important optimizations on a given pro- 
gram, but also for giving more chances to the tripling strategy to successfully 
proceed. 
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Abstract. Given a knowledge base E and a formula F both in propo- 
sitional Conjunctive Form, we address the problem of designing efficient 
procedures to compute the degree of belief in F with respect to E as the 
conditional probability Pf\e- Applying a general approach based on the 
probabilistic logic for computing the degree of belief Pf\s, we can deter- 
mine classes of conjunctive formulas for E and F in which Pf\e can be 
computed efficiently. It is known that the complexity of computing Pp\s 
is polynomially related to the complexity of solving the #SAT problem 
for the formula E A F. Therefore, some of the above classes in which 
Pp\s is computed efficiently establish new polynomial classes given by 
E U F for the #SAT problem and, consequently, for many other related 
counting problems. 

Keywords: #SAT Problem, Degree of Belief, Updating Beliefs, Approx- 
imate Reasoning. 



1 Introduction 

The general approach used here to compute the degree of belief, consists of 
assigning an equal degree of belief to all basic ’’situations”. In this manner, we 
can compute the probability that E (an initial knowledge base which involves 
n variables) will be satisfied, denoted by Pe, as: Pe = Prob(E = T) = 
where T stands for the Truth value, Prob is used to denote the probability, and 
fi(E) denotes the number of models that E has. 

We are interested in the computational complexity of computing the degree 
of belief in a propositional formula F with respect to E, such as the fraction 
of models of E that are consistent with the query F, that is, the conditional 
probability of F with respect to E, denoted by Pp\s, an( i computed as: Pf\e = 
ProbaEAF) = T\E = T)=^^. 

Conditional probabilities are key for reasoning because they formalize the 
process of accumulating evidence and updating probabilities based on new ev- 
idence. We will use the probability theory as a formal means of manipulating 
the degrees of belief. The inference of a degree of belief is a generalization of 
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deductive inference which can be used when the knowledge base is augmented 
by, e.g., statistical information, or in an effort to avoid the computationally hard 
task of deductive inference [8] . 

As it is known, the complexity of computing the degree of belief, Pf\s, is 
polynomially related to the complexity of counting the number of models of 
£ A F, which directs us to the #SAT problem. 

The ffSAT problem consists of counting the number of satisfying assignments 
for a given propositional formula. ffSAT is at least as hard as the decision 
problem SAT, but in many cases, even when SAT is solved in polynomial time, 
no computationally efficient method is known for counting the number of distinct 
solutions. 

For example, the 2-SAT problem, SAT restricted to consider a conjunction 
of 2-clauses (binary clauses), it can be solved in linear time. However, the cor- 
responding “counting” problem #2-SAT is a #P-complete problem. This also 
applies to the restrictions of #SAT for monotone 2-clauses, 2-HORN, and (2, 3/u)- 
CF (conjunction of 2-clauses where each variable appears three times at most), 
which are all #P-complete problems [8,9]. 

The maximal class of Conjunctive forms where ^SAT can be solved in poly- 
nomial time is for the class (2, 2/x)-CF (conjunction of binary clauses where each 
variable appears two times at most) [8, 9] . In order to efficiently compute the 
degree of belief in F, given an initial knowledge base £, we would then consider 
A as a (2, 2/i)-CF. In this sense, we will develop an efficient procedure in chapter 
3 for solving ffSAT for formulas in (2, 2^,)-CF. Furthermore, we will analyze how 
far we can extend the computing of the degree of belief Pf\s into the class of 
efficient procedures. 

The research presented here follows the line pointed out by Eiter and Gottlob 
[3], Papadimitriou [5], Darwiche [2], Zanuttini [10] and many others who have 
analyzed problems arising from deductive inference, such as searching for expla- 
nations, approximate reasoning and computing the degree of belief. These works 
try to differentiate the classes of Boolean formulas where such problems can be 
solved efficiently from those classes where such problems present an exponential 
complexity. 



2 Notation and Preliminaries 

Let X = {#!, . . . , x n } be a set of n Boolean variables. A literal is either a variable 
a; or a negated variable x. The pair {x,x} is complementary. We denote l as the 
negation of the literal l. We use v(l) to indicate the variable involved by the 
literal l. As it is usual, for each x € X, x° = x and x 1 = x. We use _L and T as 
two constants in the language which represent the truth values false and true, 
respectively. 

A clause is a disjunction of literals. For k £ IN, a k-clause is a clause consist- 
ing of exactly k literals and, (< k) -clause is a clause with k literals at the most. 
A unitary clause has just one literal and a binary clause has exactly two literals. 
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A variable x € X appears in a clause c if either x or x is an element of c. Let 
v(c) = {x £ X\x appears in c}. 

A conjunctive form (CF) is a conjunction of clauses (we will also consider a 
CF as a set of clauses). A k- CF is a CF containing only fc-clauses and, (< fc)-CF 
denotes a CF containing clauses with at most k literals. A kp-CF is a formula 
in which no variable occurs more than k times. A ( k , sp)- CF is a fc-CF such 
that each variable appears no more than s times, similarly a (< k. sp)- CF is a 
(< fc)-CF where each variable appears s times at most. In this sense we have a 
hierarchy given by the number of ocurrences by variable, where (fc, sp)- CF is a 
restriction of ( k , (s + l)/z)-CF, and a hierarchy given by the number of literals 
by clause, where (< k, sp)- CF is a restriction of (< (k + l),sp)-CF. 

For any CF F, let v(F) = {x £ X\x appears in F}. On the other hand, a CF 
F is of the type F(m,ri) if F consists of m clauses which involves n variables. 

An assignment s for F is a function s : v(F) — » {0, 1}. An assignment also 
can be considered as a set of literals where there are no complementary pairs of 
literals. If l is an element of an assignment, then the assignment makes l true and 
makes l false. A clause c is satisfied by the assignment s if and only if cfl s ^ 0, 
otherwise we can say that c is contradicted or falsified by s. 

A CF F is satisfied by an assignment s if each clause in F is satisfied by 
s. F is contradicted by s if any clause in F is contradicted by s. A model of 
F is an assigment over v(F) that satisfies F. Let M(F) be the set of models 
that F has over v(F). F is a contradiction or unsatisfiable if M{F) = 0. Let 
p v (F){X) = | M(E) | be its cardinality. Let K V ^(F) be the set of assignments 
over v(F) which unsatisfies F. When v(F) will clear from the context, we will 
explicitly omit it as a subscript. 

If F\ C F is a formula consisting of some clauses of F, then v(Fi) C v(F). An 
assignment over v(F\) is a partial assignment over t ’(F). Indeed, any assignment 
over v(Fi) has extensions as assignments over i ’(F). 

Let ff LANG-SAT be the notation for the ffSAT problem for propositional 
formulas in the class LANG- CF, i.e. #2-SAT denotes #SAT for formulas in 
2-CF, while #(2, 2^i)-SAT denotes # SAT for formulas in the class (2, 2p)-CF. 

3 Counting the Number of Models of a Formula 

Let A be a CF. If T = {Fj, . . . , F r } is a partition of E (over the set of clauses 
appearing in E), i.e. (Jp=i F p = s and Vpi,P 2 G [l,r],[pi ^ p 2 => F pi n 
F p 2 = 0], we will say that T is a partition in connected components of E if 
V = (t>(Fi), . . . , v(F r )} is a partition of v(E). 

If {Fj, . . . , F r } is a partition in connected components of E, then: 

Pv(Z){£) = [m«;(F 1 )(A , i)] • . . . • [_Pv(F r ){F r )] (1) 

In order to compute p(E), we should first determine the set of connected 
components of E , and this procedure can be done in linear time [1,9]. If E 
is a 2-CF, it is easy to find the connected components of E, for this, let the 
undirected clause-graph Gs given by: each clause of A is a node in Gs and 
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there is an edge: (c, d) labeled by a: if a: is the common variable between the 
clauses c and d , v(c) fl v(d) = {a:}. The different connected components of Gs 
conform the partition of £ in its connected components. From now on, when we 
mention a formula £, we can suppose that all of its clauses are connected (there 
is a path connecting whatever two clauses of £). 

3.1 An Algorithm for #(2, 2p)-SAT 

The problem of calculating #(2, 2/u)-SAT is reduced to designing procedures for 
computing the number of models of each connected component type. We will 
suppose that G % is the clause-graph of a connected component type given by £, 
and we will present the different cases for computing p{£). 

i) If Gs consists of just one node, its associated subformula £ consists of just 
one binary clause and p{£) = 3. 

ii) If £ = {c, c'} where v(c) = v(c'), then Gs is a cycle with two nodes and 
p{£) = 2, since only two assignments over v(c) falsify both c and d. 

iii) Let Gs = (V,E) be a linear chain, where | V |=| {ci,...,c m } |= m, where 
| v(ci) n v(ci + 1 ) |= 1, i = 1, ..., (m — 1). Let us write £, without a loss of 
generality (ordering the clauses and its literals, if it were necessary), as: £ = 

{ {z/o 1 > 2/1 1 } » {s/i 2 > 2/2 a } > ■ ■ • Aym-nV™}} - where 5i,e* G {0,1}, i = 1 
We will compute p{£) by calculating the value p{fi) in each step, where /) is 
a family of clauses of £ built as follows: /, = {cj}j<i, i = 1, ..., to. Then /, C 
fi+ 1 , i = 1, ..., to - 1. Let M(fi ) = {assignments over v(f t ) satisfying /,;}, 
A = {s G M(f i )\y i G s}, B, = {s G G s}. And let = |Aj|; fa = 

\Bi\, ^ = ai + Pi. 

We will calculate the pair {on, Pi) in each step i G [l,m] according to the 
signs (e,;, of the literals in the clause c,, as follows: For the first clause the 

initial pair {a\,Pi), is: (ai,/3i) = 

In a recurrent way, for i G [2, to], we can determine the values ( oti,Pi ), as: 

( Pi-lt O-i-l + Pi-1 ) if (e*) Sp = (0, 0) 

( ai-i + Pi- lt Pi-i ) if (ei, Si) = (0, 1) 

( ai-!,ai-! + Pi-i ) if (e *, Sp = (1,0) (2) 

( aii-i + Pi-i, aii-i ) if (ej, Sp = (1, 1) 

As £ = f m and |M(/j)| = m = ai+Pi then p{£) = \M(f m )\ = p m = a m + p m - 

Example 1. Suppose that F is a monotone 2 -CF and Gf is a linear chain. I.e 
F = {{x 0 ,xi),{xi,x 2 ),...,{x m -i,x m )}. (op, Pi) = (2,1) since Si = 1. 

Pi = oil + Pi = 2 + 1 = 3, 

P 2 = OL 2 + P 2 = (op + Pi) + OLi = Pi + Oil =3 + 2 = 5, 

Vi > 2 : pi = a,: + Pi = (ctj_i + Pi— 1 ) + on-i = pi-i + on-i = pi—i + Pi-2- 



f (1,2) if 6j = 0, 
1(2,1) if fir = 1. • 
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The Fibonacci series appears!. I.e. Applying the Fibonacci series until m = 5, 
we obtain the values (a,, Pi),i = (2,1) — ■> (3,2) — •> (5,3) — •> (8,5) — » 

(13,8) and y(F) = p 5 = 21. 



iv) Let Gs = (V, E) be a cycle of m nodes, then all the variables in v(£) appear 
two times, and | V |= m = n = | E |. Ordering the clauses in £ in such a 
way that | u(c*) nu(cj + i) |= 1 , and = Cj 2 whenever i\ = i 2 mod to , hence 

Vo = Vm, then r = |cj = , where d,:,e, : € {0,1}. 

The hrst clause C\ = { z/}}} , z/* 1 } has three satisfying assignments given by 
«i = fem' 1 .!/! 1 }. s 2 = {y£},y{ _Al } and s 3 = {y^yf 1 }. If <5i = e 2 we define: 



(cci, /3i)(si) 
(cci, /3 i)(s2) 
(«!, /3i)(s 3 ) 



(1,0) if <5i = 1, 
(0, 1) otherwise. 

(0,1) if 8 1 = 1, 

(1,0) otherwise. 

(1,0) if hi = 1, 
(0, 1) otherwise. 



For each A = 1,2,3 calculate (a,:, pi)(s\), with i = 2, . . . , to, as it is indicated 

Om If ym G 



by the recurrence ( 2) and let t(s\) = 
Then y(£) = f(si) + f(s 2 ) + f(s 3 ) . 



Pm if y m e s\. 



For i = 1, ..., to — 1 we will associate each pair (ctj, /?$) with the label y, which 
is the common variable between the clauses Cj and Cj + Notice that if Gs 
is a linear chain then y m is the label of (a m ,/3 m ) and yo is no the label of 
any pair, while if Gs is a cycle then all the variables in F appear as a label 
of any pair (o;* , /?*), including yo = y m associated to (a m ,P m ). 



Example 2. Let F = {(y 0 ,y i), (yi,y2), (y 2 ,ys), (y 3 ,y4), (2/4, yo)} an implicative 
2-CF. As ci = (y 0 ,yi) t/ien: si = (y 0 ,yi);s 2 = (y 0 ,yi );s 3 = (y 0 ,yi), and, 

(ai,/3i)(si) = (1,0) — > (a 2 , /32 )(si) = (1,0) — > (a 3 ,/3 3 )(si) = (1,0) — > 
(a 4 ,/? 4 )(si) = (1,0) -» (a 5 ,/3 5 )(si) = (1,0), and t(s 1 ) = a 5 = 1 since y 0 G Si. 

(ai,/3i)(s 2 ) = (0,1) — » (a 2 , /32 )(si) = (1,1) — ► (a3,/?3)(s2) = (2,1) — > 
(a 4 ,/? 4 )(s 2 ) = (3,1) -> (a 5 ,P 5 )(s 2 ) = (4,1), tfien f(s 2 ) = /3 5 = 1 since y 0 G s 2 . 

And 2/ie series (<+, /?*),i = 2,..., 5 is i/ie same (s 3 ) as well as (si), 
i/ien (0:5, /?s)(s 3 ) = (1,0), and i(s 3 ) = (3$ = 0 since yo G s 3 . Finally, y(£) = 
t(si) + t(s 2 ) + t(s 3 ) = 1 + 1 + 0 = 2. 

As can be seen, the procedures for computing /i(A) being £ a (2,2/z)-CF 
have polynomial time complexity since these procedures are based on applying 
( 2) each time that each node of a connected component is visited. There are 
other procedures for computing y{£) when A” is a (2,2/x)-CF [8,9], but these 
last proposals don’t distinguish the models in which a variable x takes value 1 
of those models in which the same variable x takes value 0, situation which is 
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made explicit in our procedure through the pair (a, (3) which is labeled by x. 
This distinction over the set of models of £ is essential when we want to count 
the new set of models for £ union a new formula F, how we will see in the next 
sections. 



4 Efficient Computing of the Degree of Belief 



and we modify ( ctj,/3j ), as: = 



Let £ a (2, 2^)-CF, we are considering that £ is consistent and fi(£) > 0 and 
then Pp\s = ^(z)^ i s well-defined. We will show here how to compute fi(£AF); 
for this, we consider the different cases for F. 

Let F = {(/)}, where v(l) G v(£). We can suppose that we had ordered the 

clauses in £, as: £ = {c* = }j _ 5j,e» € {0, 1}, v(ci)nv(c i+1 ) = {y,}. 

In order to compute n{£A(l)), while we are applying ( 2), we determine the index 
j, 1 < j < m such that the label associated to ( aj,/3j ) is v(l), that is, v(l) = yj, 

(0 ,/3j) if l appears negative in F, 

( oij , 0) otherwise. 

Notice that if Gz is a cycle, the above modification is applied to each of 
the three pairs: (ay, /3j)(sk),k = 1,2,3. The case (0, f3j) results because we are 
considering that the unitary clause (v(l)) belongs to F and then v(l) can not be 
set ’true’ in any model of £ A F. Similarly, ( otj , 0) comes from considering that 
( v(l )) appears in F and then v(l) can not be set ’false’ in any model of £ A F. 

On the other hand, if Gz is a linear chain and v(l) is in the first clause 
(v(l) G v(ci)) but it is not the label of (ai, /3\). Supposing F = {(l £ )}, e G {0, 1}, 

( (1,1) if e = ei 

then we determine the pair (ai,/3i), as: (ai,/?i) = < (1,0) if <5i = 1 

{ (o, 1) if Si = 0 

With this modification, the application of the recurrence ( 2) continues until 
we obtain the last pair (a m ,/3 m ) and we proceed as was described in (iii) or (iv) 
according that Gz being a linear chain or a cycle. 

Notice that if £ A F is unsatisfiable then P F \z = 0. Since we are not updating 
£ with the information of F, we do not have to update or revise the knowledge 
base £, which is the most common procedure when regarding update and belief 
revision, and then P F \z continues being well-defined. 

If F = {(0} an d w (0 4- u (-^)> an d as we have considered the degree of belief 
in F given £ as a conditional probability, then it makes sense to update the 
degree of belief for updating the probability space where P F \z is computed [6]. 



4.1 Updating Probabilities and Degrees of Belief 

When new pieces of information that did not originally appear in the sample 
space must be considered, we will introduce the area of updating the degree of 
belief for making an extension of the original probability space [4]. 

Let F = (Aj=i h) a conjunction of literals where there are variables that do 
not appear in the original knowledge base £. Let A = {l G F : v(l) ^ t>(17)}, 
t =| A | and | v(£) \= n. We consider F as a set of literals (the conjunction 




436 



G. De Ita Luna 



is understood between the elements of the set), let F' = F — A. There are 2" 
assignments defined over v(E) and 2 n+t assignments defined over v(E) U v(F), 



then we update the domain of the probability space over which we compute 



Pf\Ei as: Pf\E — 



Prob {EAF) 

Probx; 



pi(SAF) 

2^+t _ l*(EAF) 

Hi£l ~ 2 *-u(E) 

2 " 



. Since G A and Gsuf' a re two 



independent connected components and /AAzgA 0 = TIzgA MO = 1 ) then: 



Pf\e — 



p(E A F') -MA i eA l) 

2 t -KZ) 



H(EAF') 
2 * • 



(3) 



Example 3. Let E = {(cco, #i), (aq, x 2 ), (x 2 , £ 3 ), (% 3 , X4), ( 0 : 4 , £ 5 )}, the formula 
from the Example 1 and F = {xo,X 3 ,Xe}. A = {l £ F : v(l) (f t>(JC)} = 
{x 6 }, and t =| A | = 1. As x$ appears with the same sign in both F and the 
tail of the chain Gs, (aq,/?i) = (1,1)- ( 0:2 , /? 2 ) = (oq + /?i> a i) = (2,1) ~ ► 
(3,2) = ( 0 : 3 , A 3 ), but this last pair must be changed since v{x 3 ) is the label of 
(a 3 , A 3 ) and appears as a negated variable in F, then ( 03 , /A) = ( 0 ,/? 3 ) = (0,2). 
(a 4 ,/A) = ( 0:3 + /? 3 > Q! 3 ) = (2,0) -> (2,2) = (a 5 ,/3 5 ). And fi 5 = 4 = p(E A F) = 
p(E A F') = p(E A (&o) A (S 3 )), according to ( 3) since x§ does not appear in 

p, _ M(^A J = ib) _ f j,(E/\(x 0 )A(x 3 )A{x (i )) _ 4 4 

rppx’ — 2 t -/j.(E) — 2 *-h{E) — 2-21 — 42- 



Proposition 1. Let E a (< 2,3p,)-CF, where E can be decomposed into two 
subformulas: E = E\ U E 2 such that E\ is a (2, 2 p)-CF and E 2 is a (1, 1 p)-CF, 
then n(E) can be computed in polynomial time. 

Proof: Suppose (without a loss of generality) that E 2 = (A jLi(O))- Then p(E) = 

tt(Ei U S 2 ) = fi(Ei A Aj=i h)- And we have shown (based on the Equation ( 3)) 
how to efficiently compute p(E 1 A E 2 ) where E\ is a (2, 2p)-CF and E 2 is a 
conjunction of unitary clauses. 

Corollary 1. There is a subclass of formulas in (< 2,3 p)-CF which are not in 
(< 2,2 p)-CF and where the #SAT problem can be computed in polynomial time. 

It is easy to determine if E can be decomposed as the above proposition 
expresses, because we must merely split the clauses from E into two sets, one of 
which contains binary clauses and the other contains only unitary clauses. 

Let now a clause, F = (Vj=i h)- Let A = {l € F\v(l) ^ v(E)} 1 and t =| A |. 
Considering F as a set of literals, let F' = F — A. We can compute p(E A F), as: 
fi(EAF) = p{E)-2 t — p(EAF). Then, we are computing p(EAF) by extending 
the models of E for considering the variables which are in v(F) however they 
are not in v(E), and we are eliminating the assignments which falsify E U F. 

As, F = (Vi=i lj) then F = (Aj=i0) = (AxeF' ® A A x£ a since V ( A ) n 
(v(E) U v(F')) = 0 we could consider A a connected component independent to 
Geuf>, where F' = f\ xe FAv(x)ev(E) According to ( 1), p(EAF) = p(EAF') ■ 
H(A) = p.{E A F') since n(A) = 1, then: 
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p(2A F) n(2)-2 t -»(EAF') _ »(E A F') 

F]E 2* •#*(£) 2 t -n{2) 2* • p-(2) 1 

Notice that when F is a phrase or a clause there is no restriction on the 
number of literals that F can contain. Thus, ( 4) permits us to solve ffSAT 
for formulas (2 U F ) in a greater hierarchy than (< 2, 3/i)-CF, for considering 
clauses in F with more than 2 literals. 

Example 4. Let 2 be the formula from Example 1, and F = {(xo V x$)}. What 
is the value of p(2 A F) ?. Notice that 2 U F conforms a cycle, and p(2 A F) 
is the number of models in such cycle. A = {x G F|u(x) f u(F)} = 0 and 
F’ = F. Then p(2 A F) = p(2) • 2° - p(2 A F) = p(2) - p{2 A (x 0 ) A (x 5 ))- 
p(2 A ($o) A (X 5 )) = \x 5 , computed in the next way: As Xo is the tail of the chain 
in Gs and has a different sign in F and 2 then (ai,/3i) = (1,0), since 6 1 = 1. 
After, (1, 0) — > (1, 1) — > (2, 1) — * (3, 2) — » (5, 3) and as x 3 also appears as negated 

variable in F then p .5 — P 5 = 3. Finally, Pf\s = 1 — ^ 0 ^(s) = 1 — yj = §f. 
This example shows another way to compute p(2), when Gs is a cycle. 

Some methods for choosing among several possible revisions is based on some 
implict bias, namely the a priory probability that each element (literal or clause) 
of the domain theory requires revision. Opposite to assign the probabilities to 
each element F of the theory 2 by an expert or simply chosen by default [7] , we 
have shown here a formal and efficient way to determine such probability based 
on the degree of belief Pf\s, with the additional advantages that such probabili- 
ties could be adjusted automatically in response to newly-obtained information. 

4.2 A Polynomial Superclass of (2, 2p)-CF for #SAT 

Let now F a (2, l/z)-CF. Let A = {l € F\v(l) ^ u(I7)}, and t =| A \. Let 
F = Fi U F 2 where F 2 = {c e F \ v(c) fl v(2) = 0}, and Fj = F — F 2 . We com- 
pute n(2 A F) as n(2 A (i 7 ) U F 2 )) = p(2 A i 7 )) • p.(F 2 ) since Gp 2 is a connected 
component independent to Considering ni =| v(F 2 ) \ then p(F 2 ) = 3~. 

We order the clauses in 2, as: 2 = |cj = y^}| , e, G {0,1} 

and v(ci ) fl u(c,+ 1 ) = {yi}. According to this order, we order the clauses of 
Fi = {D 1 , ..., D mi }. For any two clauses D, = (li,l 2 ) and D 3 = (l 3 , If), i < j: if 
v(l 1 ) € v( 2 ) then v(l 1 ) appears in a clause of 2 before any other clause where 
the variables v(l 2 ), v(l 3 ) or v{lf) appear. If v(l\) £ v(2) then v(l 2 ) appears in 
a clause of 2 before any other clause where the variables v(l 3 ) or v(lf) appear. 

In order to compute p(2 A Fi), meanwhile we have applied ( 2) for each i = 
1, ..., m in order to obtain the three pairs: (a,, /9»)(sj), j = 1,2,3, we determine 
if the associated label y, ; is in v(F 3 ) or not. 

If yi fi u(Fi) no changes are applied to (a*, Pi)(sj),j = 1, 2, 3, and we continue 
to compute the next values: (a,+i, /3i+i)(sj), j = 1,2,3. 

If %ji G u(Fi), the obtained values ( oii,0i)(sj),j = 1,2,3 have to be changed. 
And for explaining this change, suppose D = (yf,z s ) G Fi, e, 6 G {0, 1}, then: 
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of yi in D, as: (ai,0i)(sj) = 



1. z v(E) then each (on, (3i)(sj),j = 1,2,3 is modified according to the sign 

(2 • ai,0i)(sj) if e = 1, 

(a*, 2 • Pi)(sj) Otherwise. 

2. z G i ’(E) and a clause c € E exists such that c = (yf ,z s ), e',8' G {0, 1}, 
if e = e' and 8 = 8' then D and c are the same clause, so we do not modify 
( oti,f3i ) and continue to compute the next pair, if e ^ e' and 8^8' then y.j 
and 2 are complementary literals in D and c, and we compute the next pairs 

(ai,/3i)(sj) if e^<5, 



of values, as: (ctj+i, (3i + i)(sj) = 



( (3i,cti)(sj ) Otherwise. 



(o^i+l, /?i+l)(Sj) — 



In other case, only one of the two literals yi or 2 changes its sign, then: 
(ch + Pi,Q)(sj) if 8 = 8' = 1 
(0, aii + Pi)(sj) if 8 = 8' = 0 
(2 • aj,0)(sj) if e = e' = 1 
(0,2 • 0i)(sj) if e = e' = 0 

3. 2 € v(E) and the variables: j/j, 2 appear in different clauses of E. Let D = 
(■ y \ , z s ) € Fi and Cj = (y^Li, V f*), Ck = (y e k k _i,z 6k ) the two clauses of E, with 
2 = y Notice that i < k since the order given over the clauses of E and 
of F\. We begin a new series of values (aj ,Pf)(sj),j = 1,2,3 ,1 = 



where the initial pair is: (aj, = 



(a il 0)(s j ) 

(0 



if e = 0, 
if e = 1. 



The computation of the new series (aj,0f)(sj),j = 1, 2, 3 (new level) con- 
tinues in a parallel way of the computation of the main series (a*, /3i)(sj),j = 

1. 2. 3, * = 1, . . . , m until we arrive to the pair (a*,, (3k) which has associated 
the label yk- All arithmetic rules applied to the main series (a,;,/?*) in order 
to obtain the next three pairs (aj+i, /?j+i) must be applied to (a],(3\) in 
order to obtain the next pairs (ot} + 1 , Pin)- 

Notice that more levels of series (a[ l \ (3 l^)(sj),j = 1,2,3 could be needed 
to compute and which are embedded to the main series (a:*, (3i)(sj), j = 

1.2.3. 



The new series (a}, (3f)(sj), j = 1,2, 3, l = stops being active when 

we arrive to the clause c*, G E. According to the sign of 2 G v(Fi) we obtain 



the last pair, as: (u\,(3 l)( s j) = 



(0,/3 l)( s j) Otherwise. 



These last three pairs are used to modify the values in (ctk,Pk) of the 
original series and in whatever pairs that could be active in 



that moment and such that a^ l \sj) < a}(sj) and (3\ L \sj ) < (3}(sj),j = 



( 0 / 



1,2,3. The new values are obtained, as: (a^,/3^)(sj) = (Max{0,a® — 
a\ }, Max{ 0, (3^ —/3l})(sj). Since the set of assignments denoted by (a\, Pi) 
are being considered in the set of assignments denoted by and in 

(Ofc , ( 3 k) - 

All computation levels of the series (af\pf^)(sj),j = 1,2,3 will end 
before arriving to the last pair (a m , (3 m )(sj), j = 1,2,3 in the main series, 
and according to (iv), y(E U Fi) = t(s 1 ) + t(s 2 ) + t(ss). 
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The above procedure permits us to compute n(£ A F\), then we can apply 
(3) in order to compute P F \ S , as: P F \ S = M( ^„ 1 Fl j ( ^. ( ) F2) = ■ This 

last equation justifies the following proposition. 



Proposition 2. There is a subclass form by S\ U S 2 of the class (2,3p)-CF 
which is not in the class (2,2 n)-CF and such that the #SAT problem can be 
computed in polynomial time. Being S\ a connected component in (2,2 p)-CF 
and S 2 a (2,1 p)-CF. 

On the other hand, if we do not put any restrictions on each of the formulas 
S and F such that S UF is a (2,3/z)-CF, then both the #SAT problem and the 
degree of belief P F \ F are #P-complete problems. 



5 Conclusions 

We determined the degree of belief P F \s based on the conditional probability 
of F given S. Given an initial knowledge base S as a (2, 2^)-CF, we designed 
efficient procedures to compute P F \s when F is a conjunction of unitary clauses 
or F is a disjunction of unitary clauses or F is a (2, l/z)-CF. 

To consider the degree of belief as a conditional probability looks promising 
for determining more polynomial classes of propositional formulas for computing 
the degree of belief as well as for the #SAT problem. Although #SAT is a #P- 
complete problem for formulas in (< 2,3/z)-CF without any restriction, we have 
shown here a class of formulas given by (S U F) which is in a greater hierarchy 
than (< 2,3^i)-CF and where both computing the degree of belief P f \e and 
solving ffSAT over ( S U F) can be done in polynomial time. 

Also, we have established new polynomial classes of conjunctive forms which 
are subclasses of (2, 3/z)-CF and superclasses of (2,2/z)-CF for the #SAT prob- 
lem. Each one of those classes can be described as the union of two formulas 
F U S and such that P f \e is computed efficiently. 

It is an open problem to determine the maximal polynomial subclass r of 
(2,3/i)-CF for the #SAT problem. As well as to know if such r could always 
be described as the union of two formulas F U S where P f \e can be computed 
efficiently. 
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Abstract. Association rules is a descriptive data mining technique, 
that has acquired major interest in the last years, with applications in 
several areas, such as: electronic and traditional commerce, securities, 
health care and geo-processing. This technique allows identifying intra- 
transactions patterns in a database. An association rule describes how 
much the presence of a set of attributes in a database’s record implicates 
in the presence of other distinct set of attributes in the same record. 
The possibility of discovering all the existent associations in a database 
transaction is the most relevant aspect of the technique. However, this 
characteristic determines the generation of a large number of rules, hin- 
dering the capacity of a human user in analyzing and interpreting the 
extracted knowledge. The use of rule measures is important to analyze 
the knowledge. In this context we investigate, firstly, the intensity of sev- 
eral objective measures to act as filters for rule sets. Next, we analyze 
how the combination of these measures can be used to identify the more 
interesting rules. Finally, we apply the proposed technique to a rule set, 
to illustrate its use in the post-processing phase. 

Keywords: Association Rules, Post-processing, Objective Measures. 



1 Introduction 

Technological advances in data acquisition and storage systems, together with 
increasing in communication speed and reduction of costs associated with these 
technologies have provided to the organizations the capacity to store a detailed 
set of information about their operations, generating a considerable amount of 
data. At the same time, the value of the information contained in this data has 
been acknowledged by the organizations. This scenery has stimulated research 
with the purpose of automating the process of transforming data into knowledge. 
This process of automatic identification of knowledge is known as data mining. 

Among data mining techniques, association rules has been among others, the 
one that has aroused more interest [1], In the academic area, some researches 
have been developed and a lot of organizations have been using this technique 
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thoroughly, in applications in areas such as commerce, securities, health, geo- 
processing [2,3]. 

In general, the association rules technique describes how much the presence 
of a set of attributes in a database’s record implicates in the presence of other 
distinct set of attributes in the same record. The possibility of finding all exis- 
tent associations is the most important aspect of the technique. However, this 
characteristic determines the generation of a huge number of rules, hindering the 
interpretation of the rules set for the user. To deal with this problem, techniques 
of post-processing knowledge, that allow the identification of interesting rules, 
have been the object of several studies. A methodology and a tool for naviga- 
tion and visualization of rules are proposed in [4]. This tool is based on a set 
of operators that make possible to focus on specific characteristics of the rules 
set, considering different aspects. A framework that combines statistical and 
graphical techniques for pruning rules is proposed in [5]. A two-step methodol- 
ogy for rule selection is proposed in [6] , on the first step the statistical measure 
chi-square is used and then, objective measures of interestingness. 

In this context, we investigate the potentiality of objective measures derived 
from contingency table to be used as a filter to identify interesting rules. The 
main idea is to identify measures that follow Pareto’s Law. This law states 
that for many phenomena, 80% of consequences stem from 20% of the causes. 
Applying this principle in the filtering of the associating rules, it’s possible to 
prioritize the set of rules to present to a specialist. 

This investigation is carried out through the analysis of the graphical form of 
the mathematical function of each measure and its stability for distinct databases. 
Using manual evaluation, we grouped the graphics by their graphical form. This 
produced four groups of functions: linear decline, accentuated fall after a con- 
stant landing, sigmoidal and accentuated initial decline. In order to a measure 
be used as a filter, it’s desirable that this measurement follows the Pareto prin- 
ciple (in the graph, this is represented by an accentuated initial decline). It is 
also desirable, that this graphical form keeps its characteristics independently 
of the databases and the parameters used in the data mining processes. In the 
sequence, we analyze how combinations of these the measures can be used to 
reduce further more the dimensionality of the rules set. 



2 Background 

The association rules was undertaken firstly for use in binary databases in [7]. 
In subsequent articles - [8,9], among others - this approach was enlarged and 
widespread, considering aspects such as continuous attributes, attributes taxon- 
omy, relational and distributed databases, and parallel processing. 

Association Rule mining is commonly stated as follows [7]: Let I = *„} 

be a set of items , and D be a set of data cases. Each data case consists of a subset 
of items in I. An association rule is an implication of the form LHS — > RHS, 
where LHS C /, RHS C I, and LHS fl RHS = 0 . The rule LHS — > RHS has 
support s in D if s% of the data case in D contains LHS U RHS (Equation 1). 
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The rule holds in D with confidence c if c% of data cases in D that supports 
LHS also support RHS (Equation 2). LHS and RHS are, respectively, the 
left and right hand side of a rule. The problem of mining association rules is to 
generate all association rules that have support and confidence greater than the 
user-specified minimum support and minimum confidence. 

support = f{LHS RHS) (1) 

confidence = f(RHS \ LHS) = (2) 

J(LHS) 

The measures support and confidence are the most used in association rules, 
either in the step of attributes subsets selection, during the process of rule gener- 
ation, or in the post-processing phase of the acquired knowledge. 

Besides support and confidence, other measures for knowledge evaluation have 
been researched with the purpose of supplying subsidies to the user in the un- 
derstanding and use of the acquired knowledge [10,6,11,9]. These metrics are 
defined in terms of the frequency counts tabulated in a 2 X 2 contingency table 
as shown in Table 1. 



Table 1 . A contingency table for a rule LHS — > RHS 





RHS 


RHS 




LHS 


f (LHS RHS) 


{(LHS RHS) 


{(LHS) 


LHS 


{(LHS RHS) 


{(LHS RHS) 


{(LHS) 




{(RHS) 


{(RHS) 


N 



Measure relative specificity that represents the specificity gain in relation to 
LHS — > true. The measure relative specificity (Equation 3) is defined by the 
expression: 

relative specificity = f{RHS \ LHS) — f(LHS) (3) 

Measure Ji (Equation 4) represents the relationship between generality and 
discriminat capacity of a rule, when it is considered the discrimination capacity 
in the cases in that f(RHS \ LHS) is greater than f(RHS)[ 9]. 

Ji = f(LHS) ■ f(RHS | LHS) • log 2 (4) 

Presented metric are usually used in association rules post-processing to sort 
rules. Using Pareto’s analysis it is possible, besides ordering the rules, to establish 
a method to filter these rules. Pareto’s analysis is a commonly used method of 
separating the major causes (the ’’vital few”) of a problem, from the minor ones 
(the ’’trivial many”). It helps prioritize and focus resources where they are most 
needed by showing where initial effort should be placed to produce the most 
gain. It also helps measure the impact of an improvement by comparing before 
and after conditions. 
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3 Analysis of Objective Measures 

Experiments with ten databases and thirteen objective measures aimed at ana- 
lyzing the behavior of the functions represented by objective measures for a pos- 
sible use into the knowledge post-processing phase were carried out. A complete 
description of the experiment, including the databases, the appliances and the 
parameters used in the process of data mining, and its results are available in [12] . 

3.1 Used Databases and Applications 

The databases used in the experiment, two are of real data - about urban quality 
of life (UQL) and about electronic commerce (BMS-POS) - and eight are artifi- 
cial databases. UQL database is composed by the set of variables of life quality 
considered in the formulation of Indicator of the Urban Quality of Life and the 
own index, regarding urban sectors of the city of Sao Carlos, SP, Brazil. BMS- 
POS 1 is a database that contains transactions of electronic commerce web site, 
for a period of two months and it was used at KDD-cup 2000 [13]. The attributes 
are related to three categories of information: click streams, order information 
and registration form. 

The artificial databases in our experiments (called of Art A, Art B, Art C, 
etc.) were generated using the application Gen, which produces artificial bases 
from synthetic data, from user-specified parameters. This application is available 
at www.almaden.ibm.com/cs/quest/syndata.html and it is used broadly in 
tests of association rules algorithms. The used attributes are of the discreet 
type, given the specific characteristics of the association rules problem. 

Characteristics of all databases are described in Table 2, including the average 
number of attributes for example (considering a database of market shopping, 
represents the average number of bought items) and the maximum number of 
attributes by rule that will limit the size of the rule. 



Table 2. Characteristics of the databases and its results from the data mining process 



Database 


Number of 
attributes 


Number of 
examples 


Average of 
attributes per 
example 


Minimum 

Support 


Minimum 

Confidence 


Maximum 
attributes 
per rule 


Number of 
rules 


Art A 


1,000 


1,000 


25 


6 


25 


5 


44 


Art B 


10 


25,000 


5 


6 


25 


5 


2,000 


Art C 


10 


25,000 


9 


6 


25 


5 


2,500 


Art D ( 2 ) 


100 


25,000 


80 


6 


25 


2 


9,000 


Art D( 3 ) 


100 


25,000 


80 


6 


25 


3 


440,000 


Art D( 4 ) 


100 


25,000 


80 


6 


25 


4 


600,000 


Art E 


1,000 


1,000 


100 


6 


25 


5 


20,000 


Art F 


19 


25,000 


25 


6 


25 


5 


80,000 


Art G 


30 


25,000 


25 


6 


25 


5 


420,000 


Art H 


30 


2,500 


25 


6 


25 


5 


430,000 


UQL 


119 


120 


119 


2 


50 


3 


24,713 


BMS-POS 


1,657 


515,597 


- 


2 


25 


5 


70,000 



1 We wish to thank Blue Martini Software for contributing the BMS-POS database. 
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Databases were mined varying the minimum values of support and confidence. 
To verify the influence of another parameter, database Art D was mined varying 
the maximum number of attributes per rule. The used values were 2, 3 and 4 
attributes per rule (the default value for this parameter is 5). Default values 
were used to other parameters. The mining process result is also displayed in 
Table 2. 



3.2 Graphical Analysis of the Measures 

In order to analyze the behavior of the functions, each measure was arranged 
individually in a decreasing way and the values plotted in graphs, with the values 
of the measures in the ordinates and the sequential number of the rule in the 
abscissas. This procedure has produced 182 graphs - one graph for each measure 
and for each execution of the data mining algorithm. 

The first step in the analysis process was to identify classes of functions 
with similar behavior. It was made by classifying manually each graphic. It was 
decided to analyze the graphs similarity, instead of analyzing equations likeness, 
since we want to analyze a more generic similarity, related mainly to the decline 
of the curve. Another aspect considered was the size of some databases that 
would difficult the identification of the equations. 

In the Table 3 are presented the results of the generated classification. This 
makes possible to verify the stability of the graphic behavior in relation to 
databases and to the mining parameters. 

When analyzing data contained in Table 3, it was observed that the measures 
relative negative confidence and relative sensibility are the ones that present the 
smallest variability, followed by the measures relative specificity, specificity and 
support. This indicates that those measures could be used as filters in a sys- 
tem that lacks user’s supervision. For the other measures it would be necessary 
more user’s supervision, as the case of the measure confidence, that holds high 
variability in the different databases. 

Category I contains the group of graphs with functions that have accentuated 
initial decline, and that is the most interesting format for use as a filter since 
this format is closer to the one of the Pareto distribution. Categories L and C 
contain the groups of graphs with functions that have linear decline and decline 
after a constant landing, respectively. Functions of this type do not follow the 
Pareto’s distribution, thus they have low restrictive capacity and are little useful 
as filters. The S category groups function graphs, which have the sigmoidal 
format. Functions of this kind can be useful for analysis in localized regions of 
the curve (beginning and ending), since the intermediary part is linear and offers 
little action as a filter. 

Another aspect to be considered is that the measures present variation on 
the graph format, that determines the need to verify the measure graphic format 
for each base of rule. 

These types of graphic behavior can be observed in Figures 1 to 3, that were 
selected for their representativeness in the carried out study. 
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Table 3. Graphs classified in categories 



Measure 


A 


B 


c 


D (2) 


E 


UQL 


BMS- 

POS 


H 


I 


J 


D (3) 


D (4) 


Mode 


confidence 


L 


C 


c 


c 


I 


C 


C 


C 


L 


c 


c 


c 


C 


conviction 


I 


I 


I 


I 


C 


I 


I 


I 


I 


I 


L 


L 


I 


negative confidence 


L 


S 


s 


s 


I 


S 


L 


S 


S 


s 


I 


I 


S 


novelty 


C 


I 


s 


s 


I 


I 


I 


s 


s 


s 


I 


I 


I 


relative confidence 


L 


I 


s 


I 


I 


I 


L 


I 


s 


s 


I 


I 


I 


relative negative confidence 


S 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


relative sensibility 


L 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


relative specificity 


S 


I 


s 


I 


I 


I 


I 


s 


I 


I 


I 


I 


I 


satisfaction 


L 


I 


s 


I 


I 


I 


C 


L 


s 


s 


I 


I 


I 


sensibility 


I 


I 


I 


L 


c 


I 


I 


L 


I 


I 


L 


L 


I 


specificity 


C 


c 


c 


L 


c 


c 


C 


C 


c 


c 


L 


L 


c 


support 


I 


I 


I 


I 


L 


I 


I 


I 


I 


I 


L 


L 


I 


Ji 


S 


I 


s 


s 


I 


I 


I 


s 


s 


s 


I 


I 


I/S 



l 



Categories 

Initial accentuated decreasing 
Linear decreasing 



S Sigmoidal 

C Decreasing after landing 



In the graphs of the confidence measure (Figure 1), can be observed the occur- 
rence of an initial portion approximately constant and an accentuated decrease 
only after 50% of the total number of rules (Figures 1(a) and 1(c)). Usually 
measures with these characteristics are not useful as filters. However, they can 
be used in other types of analyzes or as a “pre-filter”. In addition, it is observed 
that the graph of Figure 1(b) distinguishes considerably of the others. This in- 
dicates instability in the graphical form of the measure, what also disqualifies 
the measure for use as isolated filter. 

In the graphs of the support measure (Figure 2), it is possible to identify an 
initial accentuated decrease of the values, especially in the Figures 2(a) and 2(b). 
Analyzing the Figure 2(a) in details, it was observed that for the first 750 rules 
(value below to 5% of the total of rules), the value of the measure support varies 
in 96% (from 0.75 to 0.05). For the other rules, the value of support varies from 
0.05 to 0.02 (minimum support). This condition makes possible the identification 
of a reduced number of rules, and this measure can be used as a filter. 

In relation to the measure graphic format stability, it is identified a quite sim- 
ilar behavior among the bases. However for the database Art D (Figure 2(c)), 
the function decreases almost linearly until reaching the asymptote. Considering 
this small instability, the measure is indicated as a filter, with the user’s super- 
vision. Measures with similar characteristics can also be used as filters, with the 
same restrictions for this measure. 

The relative specificity measure, Figure 3, presents an accentuated initial de- 
crease, on all databases considered in the study. Like this, the use of this measure 
is indicated as filter for rule selection, once attending the need of accentuated 
initial decrease, the graphical behavior keeps constant for distinct databases. 



Combining Quality Measures to Identify Interesting Association Rules 447 




Conf] 



Number yf rules 



(a) UQL 




5.000 10.030 15.000 20.000 

Number of rules 



(b) Art A 




(c) Art D 

Fig. 1. Behavior of the confidence measure 



Observing in details the Figure 3(c), it is verified that altering the minimum 
value of relative specificity from 0.0 to 0.1 (decrease of 20%), the number of rules 
reduces in approximately 82%, that demonstrates the filter capacity. Measures 
with similar characteristics can also be used as filters, with or without the user’s 
supervision. 

It is valid to say that a measure, adopted in an isolated way, can usually sup- 
ply few subsidies for a valuable analysis of the rules sets. But its combined use, for 
example through the composition of filters, can lead to, even more useful results. 

4 Using Measures as Rules Filters 

In this section, we evaluate the measures use as an individual or a compound 
filter, being considered fundamentally the reduction of the number of rules. For 
this evaluation, the UQL database was selected, because is possible to interpret 
the rules generated from this database. 

The measures confidence, J i , relative specificity and support were selected be- 
cause they represent different graphs formats. The analysis of the restrictive 
power for each measure is accomplished through the elevation of its minimum 
value. 

On Table 4 is shown the results of the use of each measure as an individ- 
ual filter. In general, the measures support and relative specificity are the most 
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Number of roles 



(a) UQL (b) Art A 




Number of rules 



(c) Art D 

Fig. 2. Behavior of the support measure 



restrictive. It is verified that a elevation of only 5% in the minimum value of 
each measure leads to a reduction of approximately 90% in the number of rules, 
and for a elevation of 10% in the measure’s value, the number of rules is reach- 
ing about 99% of reduction. Being this, these measures can be considered as 
very restrictive filters. In terms of Pareto principles, there is that, for 95% of 
the consequences (5% of elevation), we have 10% of causes, and for 90% of the 
consequences (10% of elevation) are produced by only 1% of causes. 



Table 4. Number of rules for each percentage of elevation in the measure’s minimum 
value 





Percentage of elevation in the measure’s minimum value | 




5% 


10% 


30% 


50% 


80% | 


Measure 


Number of selected rules (percentage of reduction in the number of rules) | 


Confidence 


21,169 (14.34) 


21,095 (14.64) 


19,526 (20.99) 


15,467 (37.41) 


12,360 (49.99) 


Support 


2,137 (91.35) 


303 (98.77) 


131 (99.47) 


74 (99.70) 


2 (99.99) 


Relative 


2,888 (88.31) 


234 (99.05) 


57 (99.77) 


11 (99.96) 


2 (99.99) 


specificity 

Ji 


16,139 (34.69) 


11,143 (54.91) 


1,414 (94.28) 


296 (98.80) 


12 (99.95) 



The Ji measure can be considered as a rather restrictive filter, because only 
with reduction greater than 50%, occurs expressive decrease in the number of 
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(a) UQL (b) Art A 




Number of rules 



(c) Art D 



Fig. 3. Behavior of the relative specificity measure 



rules. The confidence measure has its maximum value, for more than 50% of the 
rules, not being suitable for use as an isolated filter. 

The combined use of measures increases the restrictive power of the filters. 
In Table 5 are presented the results for 5% of elevation, the less restrictive 
percentage and that represents about of 95% of phenomena (measure) in the 
measure value. It is pointed out that the values in the main diagonal are related 
to the individual measure use. The other values are relative to the combination 
of the measures two-by-two. Among the analyzed measures, when combined 
one with the other, the support presents the most expressive reduction in the 
number of rules, and the confidence the less expressive one. The smallest number 
of rules occurred with the combination of the measures relative specificity and 
support - 1,585 rules selected from the 24,713 rules, representing a reduction of 
approximately 93%. 

Besides the combination of measures two-by-two, the use of the four measures, 
as a single filter was also explored. The results are described in Table 6. It is 
observed an expressive reduction in the number of rules in all the situations 
and, from 10% of elevation (90% of phenomenon’s representation), the number 
of rules is small enough to be presented to a domain’s specialist, for an individual 
rule analysis. It’s observed also, that for elevation percentages above 30% the 
combination of measures is not efficient, since there is practically no reduction 
on the number of rules. 
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Table 5. Number of rules filtered for percentage of elevation of 5% 



Measure 


Confidence 


Support 


Relative specificity 


Ji 


Confidence 


21,169 


2,129 


2,751 


13,016 


Support 


2,129 


2,137 


1,585 


1,732 


Relative specificity 


2,751 


1,585 


2,888 


2,843 


Ji 


13,016 


1,732 


2,843 


16,139 



Table 6. Number of filtered rules, combining five measures 



Percentage of 
reduction 


5% 


10% 


30% 


50% 


80% 


Number of rules 


1,355 


168 


52 


11 


2 



On Table 7 are presented the 11 (eleven) rules obtained applying the 50% 
percent of elevation (Table 6). These rules are the same of Table 4 for the relative 
specificity measure. 

To evaluate the quality of the filtering process the set of rules were presented 
to a specialist, the rules from 1 to 4 and from 9 to 10 can be considered as 
“obvious knowledge” for being representing attributes highly correlated: dpi_p, 
porpdimp and pdp. The rules 5 to 8 were considered interesting by the spe- 
cialist, especially rules 7 and 8, which contradict the expectations: for low values 
of dpi_p and porpdimp, it was expected values higher of Pess_Dom. 



5 Final Considerations 

The interest about knowledge post-processing has grown as a research object in 
recent years. This is motivated mainly by the intensification of extracted knowl- 
edge uses, in practical applications, and for the great volume of this knowledge 
that makes impracticable its manual analysis. When extracting patterns with 
association rules, the considerable amount of extracted knowledge is an even 
more expressive problem, considering that this technique is known by producing 
a huge number of rules. The use of objective measures, in the post-processing 
step to rule evaluation, has as a purpose to establish filters for rule selection, 
since these measures supply an indication of the hypothetical strength associa- 
tion among LHS and RHS. 

The experiment for validation of measures and the combination of measures 
as pre-filters or filters for association rule selection was accomplished by, using 
ten databases with different characteristics and mined with varied parameters, 
it is possible to consider that the results obtained from this experiment are 
sufficiently generic to be extrapolated for other cases. 

Analyzing the use of measures as a filter individual measure, combined 
two-by-two and combination of all - was observed a reduction in the number of 
rules about 91%, 93% and 95%, when done a elevation of 5% in the measure’s 
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Table 7. Selected rules to be presented to domain expert for evaluation 



Rule 


LHS -► RHS 


Conf. 


Sup. 


Rel. Spec. 


Ji 


1 


dpi_p in (... 12.0%] — ► porpdimp in (... 3.0%] 


1.00 


0.76 


0.76 


0.29 


2 


porpdimp in (... 3.0%] — » dpLp in (... 12.0%] 


1.00 


0.76 


0.76 


0.29 


3 


dpi_p in (... 12.0%] -► porpdp in (99.8 ...] 


0.94 


0.60 


0.46 


0.17 


4 


porpdimp in (... 3.0%] — * porpdp in (99.8% ...] 


0.93 


0.60 


0.46 


0.17 


5 


dpi_p in (... 12.0%] — > SInstSan in (... 0.5%] 
and porpdimp in (... 3.0%] 


1.0 


0.57 


0.57 


0.22 


6 


porpdimp in (... 3.0%] — > SInstSan in (... 0.5%] 
and dpi_p in (... 12.0%] 


1.0 


0.57 


0.57 


0.22 


7 


dpi_p in (... 12.0%] — ♦ Pess_Dom in (2.5 ... 3.5] 
and porpdimp in (... 3.0%] 


1.00 


0.51 


0.51 


0.20 


8 


porpdimp in (... 3.0%] — ► Pess_Dom in (2.5 ... 3.5] 
and dpi_p in (... 12.0%] 


1.00 


0.51 


0.51 


0.20 


9 


dpi_p in (... 12.0%] — ► porpdimp in (... 3.0%] 
and porpdp in (99.8% ...] 


1.00 


0.60 


0.60 


0.23 


10 


porpdimp in (... 3.0%] —* dpLp in (... 12.0%] 
and porpdp in (99.8% ...] 


1.00 


0.60 


0.60 


0.23 


11 


dpi_p in (... 12.0%] and porpdimp in (... 3.0%] 
— > porpdp in (99.8% ...] 


0.93 


0.60 


0.46 


0.17 



Legend 



dpi_p Percentage of improvised 


SInstSan - Percentage of domiciles 


permanent domiciles 


without sanitary installa- 




tions 


porpdimp Percentage of improvised 


Pess_Dom - People average by domiciles 


domiciles 




porpdp Percentage of permanent 




domiciles 





minimum value (Tables 4, 5 and 6, respectively). It was also noticed that for the 
higher percentage of elevation in the measure’s minimum value, the reduction in 
the number of rules was less expressive when combining the measures, becoming 
equal from 30% of elevation. 

In the accomplished work, it is verified that some measures follow the Pareto 
principle and others do not, and therefore they are more appropriated to the 
filters constitution. Others measures can be combined and used as “pre-filters” 
or as complementary measures. In addition, the combined use of measures can 
extremely reduce the number of rules. Since the rules with high values in one 
measure don’t necessarily has a high value in other measure. Thus, the inter- 
section of rule sets selected for each measure produces a smaller rule set, but 
it should be adopted with attention to elevated percentages of elevation in the 
minimum values of the measures. 

According to the expert evaluation were identified interesting rules, useful 
to the set of rules presented, which represent about 30% of the number of rules 
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selected. It’s good to remember that the applied filter was highly restrictive, re- 
ducing the number of rules from 24.713 to just 11 ones. Filters with less intensity 
will produce selected sets of a higher size, but still possible to be interpreted by 
the specialist, and certainly with more interesting rules. 

This way, it was verified that the proposed method allows to prioritize the 
analysis of the rules with higher contribution to the phenomenon (measures), 
and that they are considered for having higher potential to be interesting. This 
method acquires more functionality when incorporated into an interactive anal- 
ysis system, in which the user can vary the parameters, select different measures 
and combinations of them, and observe if the results are satisfactory. 
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Abstract. Two dynamic cluster methods for interval data are presented: 
the first method furnish a partition of the input data and a corresponding 
prototype (a vector of intervals) for each class by optimizing an adequacy 
criterion based on Mahalanobis distances between vectors of intervals and 
the second is an adaptive version of the first method. In order to show 
the usefulness of these methods, synthetic and real interval data sets con- 
sidered. The synthetic interval data sets are obtained from quantitative 
data sets drawn according to bi-variate normal distributions The adap- 
tive method outperforms the non-adaptive one concerning the average 
behaviour of a cluster quality measure. 



1 Introduction 

Cluster analysis have been widely used in numerous fields including pattern 
recognition, data mining and image processing. Their aim is to group data into 
clusters such that objects within a cluster have high degree of similarity whereas 
objects belonging to different clusters have high degree of dissimilarity. 

The dynamic cluster algorithm [4] is a partitional clustering method whose 
aim is to obtain both a single partition P = (C \, . . . ,Ck) of the input data 
into K clusters and its corresponding set of representatives or prototypes L = 
{L i, . . . ,Lk) by locally minimizing an adequacy criterion W(P,L) which mea- 
sures the fitting between the clusters and their representation. This criterion is 
defined as: 

K 

W{P,L) = Y.M L k) ( 1 ) 

fc= 1 

where A{Lk) measure the adequation between the cluster Ck and the its repre- 
sentative Lfc. 

The k-means algorithm with class prototype updated after all objects have 
been considered for relocation, is a particular case of dynamic clustering with 
adequacy function equal to squared error criterion such that class prototypes 
equal to clusters centers of gravity [7]. 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 454—463, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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In the adaptive version of the dynamic cluster method [3], at each itera- 
tion there is a different measure to the comparison of each cluster with its own 
representation. The advantage of these adaptive distances is that the cluster- 
ing algorithm is able to recognize clusters of different shapes and sizes. The 
optimization problem is now to find a partition P = (Ci, . . . , Ck) of the input 
data into K clusters, its corresponding set of prototypes L = (Li, . . . , Lk) 
and a set of distances d = (di, . . . , da) by locally minimizing an adequacy 
criterion 

K 

W(P,L,d) = J2M L k,d k ) (2) 

fc= l 

where A(L k ,dk) measure the adequation between the cluster C k and the its 
representative L k using the distance d k - 

Often, objects to be clustered are represented as a vector of quantitative 
features. However, the recording of interval data has become a common prac- 
tice in real world applications and nowadays this kind of data is widely used 
to describe objects. Symbolic Data Analysis (SDA) is a new area related to 
multivariate analysis and pattern recognition, which has provided suitable data 
analysis methods for managing objects described as a vector of intervals [1]. 

Concerning dynamical cluster algorithms for interval data, SDA has pro- 
vided suitable tools. Verde et al [10] introduced a algorithm considering context 
dependent proximity functions and Clravent and Lechevalier [2] proposed a al- 
gorithm using an adequacy criterion based on Hausdorff distance. Moreover, in 
[9] is presented an adaptive dynamic cluster algorithm for interval data based 
on City-block distances. 

The aim this paper is to introduce two dynamic cluster methods for interval 
data. The first method furnishes a partition of the input data and a correspond- 
ing prototype (a vector of intervals) for each class by optimizing an adequacy 
criterion which is based on Mahalanobis distances between vectors of intervals. 
The second is an adaptive version of the first method. 

In these methods, the prototype of each cluster is represented by a vector of 
intervals, where the bounds of each interval are respectively, for a fixed variable, 
the average of the set of lower bounds and the average of the set of upper bounds 
of the intervals of the objects belonging to the cluster for the same variable. In 
order to show the usefulness of these methods, several synthetic interval data 
sets ranging from different degree of difficulty to be clustered and an application 
with a real data set were considered. The evaluation of the clustering results is 
based on an external validity index. 

This paper is organized as follow. In sections 2 and 3 are presented the non- 
adaptive and adaptive dynamical cluster algorithms for interval data based on 
Mahalanobis distances, respectively. In section 4 is presented the experimental 
results with several synthetic interval data sets and an application with a real 
data set. Finally, in section 5 are given the conclusions. 
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2 A Dynamical Cluster Method 

Let E = {si, . . . , s„} be a set of n patterns described by p interval variables. 
Each pattern s, (i = l,...,n) is represented as a vector of intervals x, = 
([a],b]], . . . , [a?, 5f]) T . Each cluster C k (k = 1, . . . , K) has a prototype Lfc that 
is also represented as a vector of intervals y k = ({a\, [o^, /3fc]) T - 

Let xx, = (a.l, . . . , af) T and Xjj/ = (fcl, . . . , b^) T be two vectors, respectively, 
of the lower and upper bounds of the intervals describing x, . Consider also 
y kL = («fc, • • • , aD T and y ku = (/?fc: • ■ • » /3fe) T be tw0 vectors, respectively, of 
the lower and upper bounds of the intervals describing y k . 

We define the Mahalanobis distance between the two vectors of intervals x, 



and y k as: 


d{x t ,y k ) = d (xiL,y fcL ) + d(xiu,y ku ) 


(3) 


where 


d{*iL,y k L) = ( X *L - yfcL) T Mx(x iL - y kL ) 


(4) 


is the Mahalanobis distance between the two vectors x,x and y kL and, 






d(*iu,y k u) = ( x iu ~ y k u) TM u( x iu - y ku ) 


(5) 



is the Mahalanobis distance between the two vectors x,x and y k u- 
The matrices Mi and Mp are defined, respectively, as: 

(i) M l = ( det(Q pooii )) 1 / p Q ~„ olL , where Q poolL is the pooled covariance ma- 
trix with det(Q pooii ) ^ 0, i.e., 



Q 



poolL 



(ni - l)Si l + ■ ■ ■ + ( riK - 1)S kl 
m + . . . + riK - K 



( 6 ) 



In equation (6), S kk is the covariance matrix of the set of vectors {x,x/z £ 
C k } and n k is the cardinal oi C k (k = 1, . . . , K). 



(ii) M v = ( det(Q poolu )) 1/p Q p „ olu , w here Q poolu is the pooled covariance 
matrix with det(Q pooi;7 ) ^ 0, i.e., 



Q 



poolU 



(ni - l)Sii/ + . . . + (n k — 1)Sku 
m + . . . + tik - K 



(7) 



In equation (7), S k u is the covariance matrix of the set of vectors {x^/sj £ C k } 
and n k is again the cardinal of C k (k = 1, . . . , K). 



2.1 The Optimization Problem 

According to the standard dynamic cluster algorithm, our method look for a 
partition P = [C\, . . . ,Ck) of a set of objects into K clusters and its corre- 
sponding set of prototypes L = (L i, . . . , Lk) by locally minimizing an adequacy 
criterion usually defined in the following way: 
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K 



K 



Wi(P,L) = ^2 Ak(L k ) = EE y k ) 



(8) 



k = 1 



k — 1 i G Cfc 



where d(xj, y fc ) is a distance measure between a pattern s, G C k and the class 
prototype L k of C k . 

In this method the optimization problem is stated as follows: find the vector 
of intervals y k = ([a\,/3 k ], . . . , [& k ,P k ]) which locally minimizes the following 
adequacy criterion: 

Ak(L k ) = _ y fc L) TM i(xiL - y kL ) + (9) 

i G Ck 

E ( x ^ ~ y ku) TM u(xiu - y ku ) 

i G Ck 



The problem now becomes to find the two vectors y kL and y kU minimizing 
the criterion A k (L k ). According to [5], the solution for y kL and y ku are obtained 
from the Huygens theorem. They are, respectively, the mean vector of the sets 
{xii/sj G C k } and {xm/si G C k }. 

Therefore, y k is a vector of intervals whose bounds are, for each variable j, 
respectively, the average of the set of lower bounds and the average of the set of 
upper bounds of the intervals of the objects belonging to the cluster C k . 

2.2 The Algorithm 

The dynamic cluster algorithm with non-adaptive Malralanobis distance has the 
following steps: 

1. Initialization. Randomly choose a partition {C\ . . . , Ck} of E. 

2. Representation Step. For k = 1 to K compute the vector y k = i\a\, Pi], . . . , 
[a k ,P k ]) where a k is the average of {aj/si G C k } and P J k is the average of 
{bj/si G C fc }, j = 1 ,...,p. 

3. Allocation Step, 
test <— 0 

for i = 1 to n do 

define the cluster Cfc* such that 

k * = arg minfc = i i ... )X (x iL - y fci ) T M L (x iL - y fci ) + 

( x it/ - y kuV M u{^iU - y ku ) 

if * G Cfc and k* ^ k 
test 4 — 1 
Cfc* < Cfc* U 
C k ^c k \ { Si } 

4. Stopping Criterion. 

If test = 0 then STOP, else go to (2). 
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3 An Adaptive Dynamical Cluster Method 

The dynamic cluster algorithm with adaptive distances [3] has also a represen- 
tation and an allocation step but there is a different distance associated to each 
cluster. According to the intra-class structure of the cluster C k , we consider here 
an adaptive Mahalanobis distance between an object .s,; and a prototype L k , 
which is defined as: 

4(xi, y fc ) = (x iL - y fci ) T M fcL (x iL - y kL ) + (10) 

focr - y k u) TM ku(xiu - y ku) 

where and M. k u. are matrices associated to the cluster C k , both of deter- 
minant equal to 1. 

3.1 The Optimization Problem 

The algorithm looks for a partition in K clusters, its corresponding K prototypes 
and K different distances associated with the clusters by locally minimizing an 
adequacy criterion which is usually stated as: 

K K 

ild ( P, A, d) — ^ ) A k (L k , dk) EE <4(xi, y k ) (11) 

k—1 k — 1 i G Ck 

where dfc(x i; y k ) is an adaptive dissimilarity measure between a pattern s* G C k 
and the class prototype L k of G k . 

The optimization problem has two stages: 

a) The class C k and the matrices and M k u (k = 1, . . . ,K) are fixed. 

We look for the prototype L k of the class C k which locally minimizes 

Ak(L k , d k ) = ( x ii - y fcz,) TM feL( x iL - y kL ) + (12) 

iec k 

E (XiU - y ku) TM ku{XiU - y ku) 

i G Ck 

As we know from subsection 2.1, the solutions for a kL and ft ku are, 
respectively, the average of {a^,s,: G C k }, the lower bounds of the intervals 
[aj , 6) ] , Si G Ck, and the average of { bj,Si G C k }, the upper bounds of the 
intervals [a^,fr(], Sj G C k - 

b) The class C k and the prototypes L k (k = 1, . . . , K) are fixed. We look for the 
distance d k of the class C k which locally minimizes the criterion A k {L k , d k ) 
with det(MfcL) = 1 and det(Mfc[/) = 1. 

According to [3], the solutions are: M fci = (det Q Q kL wh ere Q k L 
is the covariance matrix of the lower bounds of the intervals belonging to 
the class C k with det(Q fci ) ^ 0 and = (det Q kuY^ p Q k u w h ere Qku 
is the covariance matrix of the upper bounds of the intervals belonging to 
the class C k with det(Q fcC/ ) ^ 0. 




Two Partitional Methods for Interval- Valued Data 



459 



3.2 The Algorithm 

The initialization, the allocation step and the stopping criterion are nearly the 
same in the adaptive and non-adaptive dynamic cluster algorithm. The main 
difference between these algorithms occurs in the representation step when it is 
computed for each class k, ( k = 1 the matrices M i-l = (det(Q i , L )) 1 / p 

Q kl and M ku = (det(Q fcC/ )) 1 / p Q^r. 

Remark. If a single number is considered as an interval with equal lower and up- 
per bounds, the results furnished by these symbolic-oriented methods are identi- 
cal to those furnished by the standard numerical ones [3] when usual data (vector 
of single quantitative values) are used. Indeed, the clusters and the respective 
prototypes are identical. 



4 Experimental Results 

To show the usefulness of these methods, two experiments with synthetic interval 
data sets of different degrees of clustering difficulty (clusters of different shapes 
and sizes, linearly non-separable clusters, etc) and an application with a real 
data set are considered. 

The experiments with artificial data sets have three stages: generation of 
usual and interval data (stages 1 and 2), and evaluation of the clustering re- 
sults in the framework of a Monte Carlo experience. In each experiment, ini- 
tially, we considered two standard quantitative data sets in 3? 2 . Each data 
set has 450 points scattered among four clusters of unequal sizes and shapes: 
two clusters with ellipsis shapes and sizes 150 and two clusters with spheri- 
cal shapes of sizes 50 and 100. Each data point (21,22) of each one of these 
artificial quantitative data sets is a seed of a vector of intervals (rectangle): 
([21 — 71/2, 21 + 71/2], [22 - 72/2, 22 + 72/2]). These parameters 71,72 are ran- 
domly selected from the same predefined interval. The intervals considered in 
this paper are: [1, 8], [1, 16], [1, 24], [1, 32], and [1,40]. 

4.1 Experiment with Synthetic Data Set 1 

In this experiment, the data points of each cluster in each quantitative data 
set were drawn according to a bi-variate normal distribution with correlated 
components. Data set 1, showing well-separated clusters, is generated according 
to the following parameters: 

a) Class 1: /./ 1 = 28, p 2 = 22, a\ = 100, 0-12 = 21, a\ — 9 and p \2 = 0.7; 

b) Class 2: p-[ = 65, p .2 = 30, = 9, CT12 = 28.8, a\ = 144 and p \2 = 0.8; 

c) Class 3: p\ = 45, P 2 = 42, a 2 = 9, = 6.3, a 2 = 9 and pi 2 = 0.7; 

cl) Class 4: pi = 38, P 2 = — 1, cr 2 = 25, <712 = 20, a\ = 25 and p \2 = 0.8; 

Data set 2, showing overlapping clusters, is generated according to the fol- 
lowing parameters: 
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a) Class 1: p\ = 45, p 2 = 22, a\ = 100, er 12 = 21, erf = 9 and P 12 = 0.7; 

b) Class 2: /i-| = 65, ^2 = 30, u\ = 9, cr -12 = 28.8, erf = 144 and pn = 0.8; 

c) Class 3: p\ = 57, ^2 = 38, erf = 9, cr -12 = 6.3, erf = 9 and P 12 = 0.7; 

d) Class 4: p-\ = 42, /i 2 = 12, af = 25, cr 12 = 20, erf = 25 and p p 12 = 0.8 ; 

Figures 1 shows, respectively, interval data set 1 with well separated clusters 
and interval data set 2 with overlapping clusters. 




Fig. 1 . Symbolic data showing well-separated classes and overlapping classes 



4.2 Experiment with Synthetic Data Set 2 

Here, the data points of each cluster in each quantitative data set were also 
drawn according to a bi- variate normal distribution but its components are non- 
correlated. Data set 3, showing well-separated clusters, is generated according 
to the following parameters: 

a) Class 1: p-\ = 28, /12 = 22, a\ = 100, an = 0, a\ = 9 and pn = 0.0; 

b) Class 2: p-\ = 67, p -2 = 30, a\ = 9, <712 = 0, a\ = 144 and pn = 0.0; 

c) Class 3: pi = 45, P 2 = 45, a\ = 9, <712 = 0, erf = 9 and P 12 = 0.0; 

d) Class 4: p\ = 38, P 2 = —7, erf = 25, 012 = 0, erf = 25 and pn = 0.0; 

Data set 4, showing overlapping clusters, is generated according to the fol- 
lowing parameters: 

a) Class 1: /./ j = 45, p 2 = 22, erf = 100, eri 2 = 0, erf = 9 and pn = 0.0; 

b) Class 2: p- t = 60, p 2 = 30, a\ = 9, eri 2 = 0, erf = 144 and pn = 0.0; 

c) Class 3: p\ = 52, p 2 = 38, a\ = 9, (J 12 = 0, erf = 9 and P 12 = 0.0; 

d) Class 4: p-\ = 42, p 2 = 12, a\ = 25, cr -12 = 0, erf = 25 and p p 12 = 0.0 ; 

Figures 2 shows, respectively, interval data set 3 with well separated clusters 
and interval data set 4 with overlapping clusters. 

4.3 The Monte Carlo Experience 

The evaluation of these clustering methods was performed in the framework of a 
Monte Carlo experience: 100 replications are considered for each interval data set, 
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Fig. 2. Symbolic data showing well-separated classes and overlapping classes 



as well as for each predefined interval. In each replication a clustering method 
is run (until the convergence to a stationary value of the adequacy criterion W\ 
or W 2 ) 50 times and the best result, according to the criterion W\ or W 2 , is 
selected. 

Remark. As in the standard Malralanobis (adaptive and non-adaptive) distance 
methods for dynamic cluster, these methods have sometimes a problem with the 
inversion of matrices. When this occurs, the actual version of these algorithms 
stops the current iteration and re-starts a new one. The stopped iteration is not 
take into account among the 50 which should be run. 

The average of the corrected Rand (CR) index [6] among these 100 replica- 
tions is calculated. The CR index assesses the degree of agreement (similarity) 
between an a priori partition (in our case, the partition defined by the seed 
points) and a partition furnished by the clustering algorithm. CR can take val- 
ues in the interval [-1,1], where the value 1 indicates a perfect agreement between 
the partitions, whereas values near 0 (or negatives) correspond to cluster agree- 
ments found by chance [8]. 

Table 1 and 2 show the values of the average CR index according to the 
different methods and interval data sets. These tables also show suitable (null 
and alternative) hypothesis and the observed values of statistics following a 
Student’s t distribution with 99 degrees of freedom. 



Table 1. Comparison between the clustering methods with interval data sets 1 and 2 



Range 
of values 
of 7 i i = 1,2 


Interval Data Set 1 


Interval Data Set 2 


Non- Adaptive 
Method 


Adaptive 

Method 


#0 : Mi < M 

H a : Aii > n 


Non- Adaptive 
Method 


Adaptive 

Method 


#0 : Mi < M 

H a : Mi > M 


7* G [1,8] 


0.778 


0.996 


80.742 


0.409 


0.755 


13.266 


7 i G [1,16] 


0.784 


0.986 


82.182 


0.358 


0.688 


22.609 


7* € [1,24] 


0.789 


0.963 


61.464 


0.352 


0.572 


20.488 


7 i G [1,32] 


0.802 


0.937 


39.181 


0.349 


0.435 


18.204 


7* € [1,40] 


0.805 


0.923 


29.084 


0.341 


0.386 


9.2851 
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Table 2. Comparison between the clustering methods with interval data sets 3 and 4 



Range 
of values 
of 7 i i = 1, 2 


Interval Data Set 3 


Interval Data Set 4 


Non- Adaptive 
Method 


Adaptive 

Method 


#0 : Mi < M 
H a : Aii > l-i 


Non- Adaptive 
Method 


Adaptive 

Method 


Ho ■ Mi < M 
H a ■ Ml > M 


7» € [1,8] 


0.779 


0.995 


70.618 


0.365 


0.530 


22.360 


7 » € [1,16] 


0.789 


0.995 


57.295 


0.378 


0.497 


18.207 


7* € [1,24] 


0.792 


0.995 


52.325 


0.388 


0.474 


18.289 


7 i € [1,32] 


0.793 


0.988 


43.664 


0.398 


0.449 


8.799 


7* e [1,40] 


0.806 


0.974 


28.914 


0.373 


0.389 


2.177 



As the interval data set used to calculate the CR index by each method 
in each replication is exactly the same, the comparison between the proposed 
clustering methods is achieved by the paired Student’s t-test at a significance 
level of 5%. In these tests, /Ji and /z are, respectively, the average of the CR 
index for adaptive and non-adaptive methods. 

From the results in Tables 1 and 2, it can be seen that the average CR indices 
for the adaptive method are greater than those for the non-adaptive method in 
all situations. In addition, the statistic tests support the hypothesis that the 
average performance (measured by the CR index) of the adaptive method is 
superior to the non-adaptive method. 

4.4 Application with Real Data 

A data set with 33 car models described by 8 interval variables is used in this 
application. These car models are grouped in two a priori clusters of unequal 
sizes: one cluster (Utilitarian or Berlina) of size 18 and another cluster (Sporting 
or Luxury) of size 13. The interval variables are: Price, Engine Capacity, Top 
Speed, Acceleration, Step, Length, Width and Height. 

The CR indices obtained from the clustering results are, respectively, 0.242 
and 0.126 for adaptive and non-adaptive methods. From these results, we can 
conclude that, for this data set, the adaptive method surpass the non-adaptive 
method concerning this clustering quality measure. 



5 Conclusions 

In this paper, dynamic cluster methods for interval data are presented. Two 
methods are considered: the first method furnish a partition of the input data 
and a corresponding prototype for each class by optimizing an adequacy criterion 
which is based on Mahalanobis distances between vectors of intervals. The second 
is an adaptive version of the first method. In both methods the prototype of each 
class is represented by a vector of intervals, where the bounds of these intervals 
for a variable are, respectively, the average of the set of lower bounds and the 
average of the set of upper bounds of the intervals of the objects belonging to 
the class for the same variable. 
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The convergence of these algorithms and the decrease of their partitioning 
criterions at each iteration is due to the optimization of their adequacy crite- 
rions. The accuracy of the results furnished by these clustering methods were 
assessed by the corrected Rand index considering synthetic interval data sets in 
the framework of a Monte Carlo experience and an application with a real data 
set. Concerning the average CR index for synthetic interval data sets, the method 
with adaptive distance clearly outperforms the method with non-adaptive dis- 
tance. This was also the case for the car data set. 

Acknowledgments. The authors would like to thank CNPq (Brazilian Agency) 
for its financial support. 
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Abstract. In this paper, a classifier for quantitative feature values (in- 
tervals and/or points) based on a region oriented symbolic approach is 
proposed. In the learning step, each class is described by a region (or a 
set of regions) in 5R P defined by a convex hull. To affect a new object to a 
class a dissimilarity matching function compares the class description (a 
region or a set of regions) with a point in 5R P . Experiments with two ar- 
tificial data sets generated according to bi-variate normal distributions 
have been performed in order to show the usefulness of this classifier. 
The evaluation of the proposed classifier is accomplished according to 
the calculation of the prediction accuracy (error rate), speed and storage 
measurements computed through of a Monte Carlo simulation method 
with 100 replications. 



1 Introduction 

Supervised classification aims to construct classification rules from examples, 
with known class structures, which allow to assign new objects to classes [4]. 
With the explosive growth in the use of databases new classification approaches 
have been proposed. Symbolic Data Analysis (SDA) [1] has been introduced as 
a new domain related to multivariate analysis, pattern recognition and artificial 
intelligence in order to extend classical exploratory data analysis and statistical 
methods to symbolic data. Symbolic data allows multiple (sometimes weighted) 
values for each variable, and it is why new variable types (interval, categorical 
multi-valued and modal variables) have been introduced. These new variables 
allow to take into account variability and/or uncertainty present in the data. 

Ichino et al. [5] and Yaguchi et al. [8] introduced a symbolic classifier as a 
region oriented approach for quantitative, categorical, interval and categorical 
multi-valued data. This approach is an adaptation of the concept of mutual 
neighbours introduced in [3] to define the concepts of mutual neighbours between 
symbolic data and Mutual Neighbourhood Graph between groups. At the end of 
the learning step, the symbolic description of each group is obtained through 
the use of an approximation of a Mutual Neighbourhood Graph ( MNG ) and a 
symbolic join operator. In the allocation step, an observation is assigned to a 
particular group based on a dissimilarity matching function. 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 464—473, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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Souza et al. [7] and De Carvalho et al. [2] have proposed another MNG 
approximation to reduce the complexity of the learning step without losing the 
classifier performance in terms of prediction accuracy. Concerning the allocation 
step, a classification rule, based on new similarity and dissimilarity measures, 
have been introduced to carry out the assignment of an individual to a class. 

In this paper, we introduce a new symbolic classifier for symbolic data based 
on a region oriented approach. Here, we are concerned with symbolic data that 
are represented by quantitative feature vectors. More general symbolic data type 
can be found in [1]. Let Q = {u>i, ■ • ■ , w„}, be a set of n individuals described 
by p quantitative features Xj (j — 1 ,... ,p). Each individual c Oi (i = 1, . . . ,n) 
is represented by a quantitative feature vector = ( Xu , . . . , Xi P ) , where Xij 
is a quantitative feature value. A quantitative feature value may be either a 
continuous value (e.g., x,;j = 1.80 meters of height) or an interval value (e.g., 
x^ = [0,2] hours, the duration of a student evaluation). 

Table 1 shows an example of a data table, where each individual is repre- 
sented by two continuous feature values x\ and X 2 and a categorical feature that 
represents its class. 

At end of the learning step, the description of each class is a region (or a set 
of regions) in 3? p defined by the convex hull of the objects belonging to this class 
which is obtained through a suitable approximation of a Mutual Neighbourhood 
Graph (MNG). This approach aims to reduce the over-generalization that is 
produced when each class is described by a region (or a set of regions) defined 
by the lryper-cube formed by the objects belonging to this class and then to 
improve the accuracy performance of the classifier. The assignment of a new 
object to a class is based on a dissimilarity matching function which compares 
the class description with a point in 3? p . 

Section 2 presents the concepts of regions and graphs. In sections 3 and 
4, the learning and allocation steps of this symbolic classifier are presented, 
respectively. In order to show the usefulness of this approach, two artificial data 
sets generated according to bi- variate normal distributions are classified. To 
evaluate the performance of the new classifier, prediction accuracy, speed and 
storage measurements are computed in the framework a Monte Carlo experience. 
Section 5 describes the experiments and the performance analysis and section 6 
gives the concluding remarks. 



Table 1. A data table for n = 6 individuals and p = 2 continuous values 



Individuals 


Continuous Feature Values 


Class 


Xl 


X2 


COl 


0.189566 


-0.080602 


0 


L02 


0.203821 


-0.390586 


0 


CU 3 


0.448117 


2.672457 


1 


Cc *4 


0.477598 


3.188989 


1 


X5 


4.102095 


2.757729 


2 


OJ6 


2.355855 


3.749226 


2 
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Fig. 1 . (a) J-region, (b) H-region, (c) over-generalization 



2 Concepts of Regions and Graphs 

Let C k = . . . , k = 1, . . . , m, be a class of individuals with C^nC* ./ = 

0 if k ^ k' and U k =iC k = 17. The individual to k i,l = 1, . . . , N k , is represented 
by the continuous feature vector x k i = (%kiu ■ ■ ■ ,x k ip). 

A symbolic description of the class C k can be obtained by using the join 
operator (Ichino et al (1996)). 

Definition 1. The join between the continuous feature vectors x k i (l = 1,..., 
N k ) is an interval feature vector which is defined as y fc = Xfci ® . . . © x k pj k = 
(x k n © • • -®XkN k i, ■ ■ - ,Xkij © • • • ®XkN k j , • • • , Xkip © • • .©XfeATfcp), where x k \ j © 
... © x k N k j fnin)'x k ij ? • • • > x k N k j } ? tyiq,x\x , * - * , x k N k j }] ( j 1 ? * • * > p) • 
Moreover, we can associate to each class C k two regions in one spanned by 
the join of its elements and another spanned by the convex hull of its elements. 

Definition 2. The J-region associated to class C k is a region in which is 
spanned by the join of the objets belonging to class C k and it is defined as 
Rj{C k ) = {x £ Sf p : min{x kl j,...,x k N k j} < xj < max{x k ij, . . . ,x kNkj },j = 
1, . . . ,p}. The volume associated to the lryper-cube defined by the region Rj(C k ) 
is n(Rj(C k )). 

Definition 3. The H-region associated to class C k is a region in which is 
spanned by the convex hull formed by the objects belonging to class C k and 
it is defined as i?_f/(Cfc) = {x = {x\ , . . . ,Xj, . . . ,x p ) £ F : x is inside the 
envelop of the convex hull defined by the continuous feature vectors x k i = 
{x k n, . . . ,x k i p ), 1 = 1,... N k }. The volume associated to the internal points in- 
side the convex hull envelop defined by Rn{C k ) is n(R.H{C k )). 

Figure 1 illustrates the description of a class by a J-region and by a H- 
region. It is clear that the representation based on a J-region (see Ichino et al 
(1996), Souza et al (1999) and De Carvalho et al (2000)) over-generalizes the 
class description given by a H-region. It is why in this paper this last option will 
be used. 

The mutual neighbourhood graph (MNG) (Ichino et al (1996)) yields infor- 
mation about interclass structure. 
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Definition f. The objects belonging to class C k are each one mutual neighbours 
(Ichino et al (1996)) if Vw fc // G C k > (k' G {1 ,...,m},k'j£ k),x. k >i £ Rj(C k ) 
( l = 1 ,...,N k >). In this case, the MNG of C k against C k = U™^ fe C*/, which 

k' = l 

is constructed by joining all pairs of objects which are mutual neighbours, is a 
complete graph. 

If the objects belonging to class C k are not each one mutual neighbours, we 
look for all the subsets of C k where its elements are each one mutual neighbours 
and which are a maximal clique in the MNG, which, in that case, is not a 
complete graph. To each of these subsets of C k we can associate a J-region and 
calculate the volume of the corresponding lryper-cube defined by it. 

In this paper we introduce another definition to the MNG. 

Definition 5. The objects belonging to class C k are each one mutual neighbours 
if Vwfe/j G C k ',U_& {1, . . . , to}, k' ^ k,x k >i ({_ R H (C k )(l = 1 ,...,N k >). The MNG 
of C k against C k = U ™^ k C k i, defined in this way is also a complete graph. 

k' = 1 

If the objects belonging to class C k are not each one mutual neighbours, again 
we look for all the subsets of C k where its elements are each one mutual neigh- 
bours and which are a maximal clique in the MNG and to each of these subsets 
of C k we can associate a H-region and calculate the volume of the corresponding 
convex- hull defined by it. 



3 Learning Step 

This step aims to provide a description of each class through a region (or a set 
of regions) in defined by the convex hull formed by the objects belonging 
to this class, which is obtained through a suitable approximation of a Mutual 
Neighbourhood Graph (MNG). Concerning this step, we have two basic remarks: 

a) The first is that a difficulty arises when the MNG of a class C k is not com- 
plete. In this case, we look for all the subsets of C k where its elements form 
a maximal clique in the MNG. However, it is well known that the computa- 
tional complexity in time to find all maximal cliques on a graph is exponential 
and the construction of an approximation of the MNG is necessary. 

b) The second concerns the kind of region that is suitable to describe a class C k . 



3.1 The Learning Algorithm 

The construction of the MNG for the classes C k (k = 1 ,m) and the repre- 
sentation of each class by a H-region (or by a set of H-region) is accomplished 
in the following way: 

For k = 1 , , m do 

1 Find the region iijy(Cfc) (according to definition 3) associated to class 
C k and verify if the objects belonging to this class are each one mutual 
neighbors according to the definition 5 

2 If it is the case, construct the MNG (which is a complete graph) and stop 
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3 If it is not the case ( MNG approximation) do: 

3.1 choose an object of Ck as a seed according to the lexicographic order 
of these objects in Ck\ do t = 1 and put the seed in the set 
remove the seed from Ck 

3.2 add the next object of Ck (according to the lexicographic order) to 
C'l if all the objects belonging now to C*. remains each one mutual 
neighbors according to the definition 5; if this is true, remove this 
object from Ck 

3.3 repeat step 3.2) for all remaining objects in Ck 

3.4 Find the region Rh(CI ) (according to definition 3) associated to Cfi) 

3.5 if Ck ^ 0 , do t = t + 1 and repeat steps 3.1) to 3.4) until Ck = 0 

4 construct the MNG (which now is not a complete graph) and stop 

At the end of this algorithm it is computed the subsets Cl, ... , C^ k of class 
Ck and it is obtained the description of this class by the H-regions Rh{C\), . . . , 
RH{C^ k ). Figure 2 shows an example for the case of two classes. 




Fig. 2. (a) MNG approximation and J-sets (Ichino et al (1996)), (b) MNG approxima- 
tion and J-sets (Souza et al (1999), De Carvalho et al (2000)), (c) MNG approximation 
and H-sets (proposed approach) 



4 Allocation Step 

The aim of the allocation step is to associate a new object to a class based on a 
dissimilarity matching function that compares the class description (a region or 
a set of regions) with a point in 

Let u> be a new object, which is candidate to be assigned to a class Ck(k = 
l,...,m), and its corresponding description given by the continuous feature 
vector x = {x \, . . . , x p ). Remember that from the learning step it is computed 
the subsets Cl,..., C^ k of Ck- 

The classification ride is defined as follow: u is affected to the class Ck if 

6(w,C k ) <8(u,C h ),Vh€ {l,...,m} (1) 

where 6(w, Ch) = min{6( uj, Cjfi, . . . , 6(ui, C^ h )}. 
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Fig. 3. Differences in area 



In this paper, the dissimilarity matching function 6 is defined as 



6(",C s h ) 



<R H (c° h u M) - <R H {c s h ) 

7T(R H (C S h UM) 



(2) 



Example: Consider two classes: Ck = {oq, u; 2 , u> 3 , uq} where where = (1,2), 
x 2 = (3, 2),x 3 = (1, 4), x 4 = (3,4) and C7, = {w 5 , lo 6 , oq} where x 5 = (6,6),x e = 
(7,3),X7 = (8,5). Let at be a new individual where x w = (5,5). Using the 
function given by the equation (2), we obtain S(co,Ck) = 0.43 and S(u>,Ch) = 
0.44. According to the classification rule, the new individual u> is affected to 
class Ck- Figures 3(a) and 3(b) show the differences in area if the individual at 
is associated to the Ck ( Ck U {a;}) or the Ch (Ch U {at}). 



5 Experiments and Performance Analysis 

Experiments with two artificial quantitative data sets in 5ft 2 and a corresponding 
performance analysis of the classifier introduced in this paper are considered in 
this section. In these experiments, 100 replications of each set with identical 
statistical properties are obtained for each one of the training and test sets. The 
data points of each cluster in each data set were drawn according to a bi-variate 
normal distribution with correlated components. 

5.1 Experiment 1 

In this experiment, the data set has 300 points scattered among three clusters 
of equal sizes and unequal shapes: two clusters with ellipsis shapes and sizes 100 
and one cluster with spherical shapes of size 100. 

Data set 1 (Fig. 4) with three clusters is generated according to the following 
parameters: 

a) Class 1: = 0, /12 = 0, erf = 4, a ±2 = 1.7, = 1 and p \2 = 0.85; 

b) Class 2: pi = 0, p, 2 = 3, er 2 = 0.25, 0-12 = 0.0, = 0.25 and p i2 = 0.0; 

c) Class 3: pi = 4, /x 2 = 3, a\ = 4, a i 2 = —1.7, <t| = 1 and pi 2 = —0.85; 
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X 2 




Fig. 4. Quantitative data with three classes 



5.2 Experiment 2 

Here, the data set has 500 points scattered among five clusters of equal sizes and 
unequal shapes: three clusters with ellipsis shapes and sizes 100 and two clusters 
with spherical shapes of sizes 100. 

Data set 2 (Fig. 5) with five clusters is generated according to the following 
parameters: 

a) Class 1: pi = 0, p 2 = 0, of = 4, a 12 = 1.7, of = 1 and p\ 2 = 0.85; 

b) Class 2: pi = 0, p 2 = 3, of = 0.25, oq 2 = 0.0, of = 0.25 and p\ 2 = 0.0; 

c) Class 3: p\ = 4, p 2 = 3, of = 4, a\ 2 = —1.7, of = 1 and p 12 = —0.85; 

c) Class 4: p\ = 8.5, p 2 = 5.5, of = 4, 0^12 = 1.7, of = 1 and pi 2 = 0.85; 

b) Class 5: pi = 7.5, p 2 = 3.0, a\ = 0.25, oq 2 = 0.0, o-| = 0.25 and P 12 = 0.0; 

5.3 Performance Analysis 

The evaluation of our method, called here H-region approach , where class rep- 
resentation, MNG approximation and dissimilarity matching function are based 
on H-regions, is performed based on prediction accuracy, speed and storage in 
comparison with the approach where class representation, MNG approximation 
and dissimilarity matching function are based on J-regions (called here J-region 
approach). 

The prediction accuracy of the classifier was measured through the error rate 
of classification obtained from a test set. The estimated error rate of classification 
corresponds to the average of the error rates found between the 100 replications. 
Speed and storage were also assessed, respectively, from the 100 replications, by 
the average time in minutes spent in the learning and allocation steps and the 
average memory in k-bytes spent to store the regions. 

To compare these methods, we use paired Student’s t-tests at the significance 
level of 5%. Tables 2, 3 and 4 show, respectively, comparisons between these 
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Fig. 5. Quantitative data with five classes 



Table 2. Comparison between the classifiers according to the average error rate 



Quantitative 


H-region 

approach 


J-region 

approach 


Hypothesis 


Decision 


data sets 


average 

error 


standard 

deviation 


average 

error 


standard 

deviation 


Ho : pi > p.2 
H 1 : pi < H2 




three classes 


2.57 


0.991 


3.16 


1.062 


-5.5642 


Reject Ho 


five classes 


4.716 


1.086 


5.29 


1.134 


-5.171 


Reject Ho 



classifiers according to the average error rate, the average time and the average 
memory for the data sets with three and five classes. In all these tables, the 
statistics test follow a Student’s t distribution with 99 degrees of freedom and 
Hi and /i 2 are, respectively, the corresponding averages for the H-region approach 
and the J-region approach. 

From the values in Table 2, we can conclude that for these data sets the 
average error rate for H-region approach is lower than that for the J-region 
approach. Also, from the test statistics we can see that the H-region approach 
outperforms the J-region approach on prediction accuracy. From Tables 3 and 
4, we can see that the average time and average memory values calculated using 
H-region approach are greater than those using the J-region approach in both 
data sets and paired Student’s t-tests at the significance level of 5% support the 
hypothesis that the J-region approach outperforms the H-region approach. 

The complexity in time of H-region approach is 0(n 2 logn ) (see [6]), whereas 
the complexity of the J-region approach is 0(n 2 ) (see [2]), n being the cardi- 
nality of the learning set. In conclusion, concerning the speed and storage, the 
J-region approach clearly outperforms H-region approach as the size of data set 
grows. However, H-region approach presents better results concerning prediction 
accuracy. 
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Table 3. Comparison between the classifiers according to the average time 



Quantitative 


H-region 

approach 


J-region 

approach 


Hypothesis 


Decision 


data sets 


average 

error 


standard 

deviation 


average 

error 


standard 

deviation 


Ho pi> p.2 
Hi : /ri < p.2 




three classes 


23.93 


3.66 


7.29 


0.47 


44.4236 


Accept Ho 


five classes 


63.06 


7.46 


12.33 


0.47 


68.088 


Accept Ho 



Table 4. Comparison between the classifiers according to the average memory 



Quantitative 
data sets 


H-region 

approach 


J-region 

approach 


Hypothesis 

Ho ■ pi > p.2 
Hi : pi < p 2 


Decision 


average 

error 


standard 

deviation 


average 

error 


standard 

deviation 


three classes 


44110 


271.8 


27441 


184 


817.993 


Accept Ho 


five classes 


73859 


405.5 


46037 


255.8 


975.198 


Accept Ho 



6 Concluding Remarks 

A classifier for quantitative feature values based on a region oriented symbolic 
approach is presented in this paper. The input of this classifier is a set of contin- 
uous feature vectors. Concerning the learning step, each class is described by a 
region (or a set of regions) in defined by the convex hull formed by the objects 
belonging to this class, which is obtained through a suitable approximation of 
a Mutual Neighbourhood Graph ( MNG ). The goal of this step is to reduce the 
over-generalization that is produced when each class is described by a region (or 
a set of regions) in defined by the hyper-cube formed by the objects belonging 
to this class and in this way to improve the classifier’s prediction accuracy. In 
the allocation step, to assign a new object to a class, a dissimilarity matching 
function, which compares a class description (a region or a set of regions) with 
an individual description (point in 3? p ), was introduced. 

To show the usefulness of this approach, two artificial quantitative data sets 
in 3? 2 are considered presenting three and five classes. These data sets were 
drawn according to bi-variate normal distributions with correlated components. 
The evaluation of the proposed classifier was based on prediction accuracy, and 
speed and storage measurements in comparison with the classifier which uses 
in the learning and allocation steps the J-region. The prediction accuracy was 
evaluated according to the error rate of classification. These measurements were 
calculated in the framework of a Monte Carlo experience with 100 replications 
and were obtained from independent test sets. The results showed that, concern- 
ing the speed and storage measurements, the J-region approach outperforms H- 
region approach. However, H-region approach furnishes better results concerning 
prediction accuracy. 
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Abstract. This paper reports the results of a rule extraction process 
from an Artificial Neural Network (ANN) used to predict the backbone 
dihedral angles of proteins based on physical-chemical attributes. By an- 
alyzing the fuzzy inference system extracted from the knowledge acquired 
by the ANN we want to scientifically explain part of the results obtained 
by the scientific community when processing the Hydropathy Index and 
the Isoeletric Point and also show that the rule extraction process from 
ANNs is an important tool that should be more frequently used. To ob- 
tain these results we defined a methodology that allowed us to formulate 
hypothesis statistically sustained and to conclude that these attributes 
are not enough to predict the backbone dihedral angles when processed 
by an ANN approach. 



1 Introduction 

Since Anfinsen [1] first stated that the information necessary for protein folding 
resides completely within its primary structure an intensive research has been 
done to corroborate this theory. Thirty years later, one of the major goals of 
computational biology still is to understand the relationship between the amino 
acid sequence and the structure of a protein. If this relationship were known, 
then the structure of a protein could be reliably predicted. Unfortunately this 
relationship is not that simple. 

These last three decades had exposed a gradual growth in the complexity 
of computational techniques engaged in the problem of protein folding. Among 
various empirical methods, based on databases where structures and sequences 
are known, the use of Artificial Neural Networks (ANNs) demonstrated evidences 
of being a suitable technique provided that it learns from a set of examples over 
which shows a capacity to generalize the acquired knowledge to unseen data. 
This characteristic makes the ANN a fitted technique for processing data from 
domains where there are little or incomplete knowledge about the problem to 
be solved but there are enough data for designing a model. 
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This search led to variations in the way that the amino acid primary sequence 
has been coded. As compiled by Wu & McLarty [2], most of the works use a 
variety of orthogonal codifications to represent each one of the 20 amino acids. 
Other works, in minor scale, use the amino acid physical-chemical attributes to 
represent them. Unfortunately, the use of these attributes did not bring improve- 
ments in the accuracy. Cherkauer & Shavlik [3] observe “. . . our most surprising 
(and disappointing) discovery is that the ability to choose among thousands of 
features does not appear to lead to significant gains on the secondary structure 
task” . Baldi & Brunak [4] affirm “// one wants to use real-numbered quantifica- 
tion of residue hydrophobicity, volume, charge, and so on, one should be aware 
of the harmful impact it can have on the input space” . 

In this context, this paper reports the results obtained by extracting rules 
from an ANN used to predict the backbone dihedral angles of proteins ( phi 
and psi angles) based on physical-chemical attributes. By analyzing the fuzzy 
inference system extracted from the ANN, when processing the Hydropathy 
Index (HP scale) and the Isoeletric Point (pi), we want to scientifically explain 
part of the “disappointing” results reported by Cherkauer & Shavlik [3] and 
the “harmful impact” cited by Baldi & Brunak [4], To obtain these results we 
defined and followed a methodology composed by 5 main steps exposed over 
the next sections: database definition; network training; rule extraction; results 
interpretation; and hypothesis formulation. 

This paper is structured as follows. In section 2 we present the database 
definition of which a list of proteins is built and the attributes are defined. The 
training process of the ANN is presented in section 3 and the rule extraction pro- 
cess is introduced in section 4. Section 5 presents the interpretation of the results 
and the next section closes the methodology with the hypothesis formulation. 
Finally, in section 7 we present the conclusions. 



2 Database Definition 

Concerning the database used in the training process, we adopted a validated 
one. The EVA project [5] is a web-based server that performs a weekly automatic 
evaluation of the “n” newest experimental structures added to Protein Data 
Bank (PDB) 1 . One of its various outputs is an up to date list of non-homologous 
domains. At the time the EVA database was accessed, 2.980 non-homologous 
domains were identified. We adopted the EVA database due to the possibility of 
scaling from a fewer to a higher number of examples as needed. 

Once the database is identified we need to define the attributes to be pro- 
cessed by the ANN and over which the knowledge will be extracted. Among 
various physical-chemical attributes available in the literature we opted by the 
Hydropathy Index (HP scale) and the Isoeletric Point (pi). The choice of these 
attributes was biased by their importance in the protein folding process reported 
by most of the classical Biochemistry books (e.g. Lehninger [6], Voet & Voet [7]). 



http:/ /www. rcsb.org/pdb 
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In Table 1 we can view the amino acids and the original values for HP and pi 
attributes. These values were standardized (i.e. modified to have a mean of 0 
and a standard deviation of 1) when submitted to the ANN. 

The HP scale, introduced by Kyte & Doolittle [8], is a degree of hydropho- 
bicity of amino acid side chains. It can be used to measure the tendency of an 
amino acid to seek or avoid an aqueous environment. Lower values represent 
hydrophilic environment and higher values hydrophobic ones. The pi is based 
on a relationship between the net electric charge of the amino acid and the pH 
of the solution. Additional explanations can be found at Lehninger [6], Voet & 
Voet [7] and others. 



Table 1 . The amino acids and their values for Hydropathy Index (HP) and Isoeletric 
Point(pl) 



Letter 


Abreviation 


Name 


HP 


Pi 


A 


ALA 


Alanine 


1,8 


6.01 


C 


CYS 


Cysteine 


2,5 


5,07 


D 


ASP 


Aspartate 


-3,5 


2,77 


E 


GLU 


Glutamate 


-3,5 


3,22 


F 


PHE 


Phenylalanine 


2,8 


5,48 


G 


GLY 


Glycine 


-0,4 


5,97 


H 


HIS 


Histidine 


-3,2 


7,59 


I 


ILE 


Isoleucine 


4,5 


6,02 


I< 


LYS 


Lysine 


-3,9 


9,74 


L 


LEU 


Leucine 


3,8 


5,98 


M 


MET 


Methionine 


1,9 


5,74 


N 


ASN 


Asparagine 


-3,5 


5,41 


P 


PRO 


Proline 


-1,6 


6,48 


Q 


GLN 


Glutamine 


-3,5 


5,65 


R 


ARG 


Arginine 


-4,5 


10,76 


S 


SER 


Serine 


-0,8 


5,68 


T 


THR 


Threonine 


-0,7 


5,87 


V 


VAL 


Valine 


4,2 


5,97 


W 


TRP 


Tryptophan 


-0,9 


5,89 


Y 


TYR 


Tyrosine 


-1,3 


5,66 



3 Training the ANN 

A diagram of the basic network is shown in Figure 1. The processing units are 
arranged in layers, with the input units shown at the top and the output unit 
on the bottom. The units in the input layer are fully connected with the units 
on the hidden layer and with the output unit (i.e. short-cut). The units on the 
hidden layer are fully connected with the unit on the output layer. 

To process the residues belonging to the protein’s primary structure the win- 
dowing technique was used. It consists of a sliding window that is shifted over 
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Fig. 1 . Architecture of the ANN used 



the full sequence from left to right. Each time the window is shifted a new input 
is generated. The phi and psi angles of the central residue on the window are 
the target of prediction. 

With the windowing technique we explicitly want to capture local interactions 
of the amino acids with the aim to explain the behavior of the physical-chemical 
attributes in short range intervals. Albeit, we have in mind that some influences 
to the protein folding process are also caused by long distant interactions. 

This work implements a network for a function approximation problem in- 
stead of a common classification one. It tries to predict the torsion angle ( phi or 
psi ) instead of the traditional alpha, beta and coil classes. The reason for this 
approach is to offer a suitable result for algorithms that need a starting point 
more precise than the information returned by a class. The learning algorithm 
used in the supervised learning was the Resilient Propagation (RProp). 

After some experiments at which were tested the number of hidden units (i.e. 
0,1,2,4,8,16), the number of residues covered by the window (i.e. 3,5,7,. . . ,19) and 
the seed for random initialization of weights, the final network was defined as: 
window size of 13 residues and 4 units on the hidden layer. The window size 
of 13 residues leads to an input layer of 26 units (13 residues * 2 attributes). 
We noticed a behavior similar to the one reported by Qian & Sejnowski [9] 
when varying the window size and the number of hidden units. The Root Mean 
Square (RMS) errors obtained in the training process were: 0.540593 for phi and 
0.817681 for psi. 

4 Rule Extraction 

The rule extraction algorithm used in this work was the Fuzzy Automatically 
Generated Neural Inferred System (FAGNIS) [10]. It allows the rule extraction 
directly from the trained ANN, guarantees the functional equivalence between 
the ANN and the fuzzy inference system extracted and shows a high compre- 
hensibility of the extracted rules (less rules and simple rules) . 

The main idea behind FAGNIS can be described as the act of partition- 
ing the neural space into linear regions. It is done like a linear regression by 
parts as the network is being trained. Each linear region found corresponds to a 
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Takagi-Sugeno rule [11]. Three outputs are reported to each linear region (rule) 
generated by FAGNIS: a prototype that characterizes the behavior of the pat- 
terns belonging to the rule; a linear equation based on the analysis of the neural 
space that gives the output of a pattern belonging to the rule; and a membership 
function that computes the membership value of the pattern. 

Takagi-Sugeno rules [11] are based on the fuzzy partitioning of the input 
space. For each subspace, a linear equivalence between input and output is ac- 
complished. The fuzzy inferred system with input variables X\, . . . , x n and the 
output variable y, whose value is inferred, contains a rule system in the form: 

R: If f (xiisFi, x n is F n ) Then y = g{x i . . . , x n ) (1) 

where: F\, . . . , F n are fuzzy sets with linear membership functions representing 
a fuzzy subspace in which the implication R can be applied for reasoning; / is 
the logical function that connects the propositions in the premise; and g is the 
function that implies the value of y when x\, . . . , x n satisfies the premises. The 
linear consequent has a higher degree of comprehensibility than a nonlinear one 
(and knowledge acquisition is an important point in this work). 

In Figure 2 we can observe how FAGNIS handles a sigmoid activation unit. 
The unit is represented by 3 rules: 

IfxisFi Then y = — 1 'j 

If xisF 2 Then y = 0.5x > (2) 

IfxisFz Then y = +1 J 

that show an equivalence with the original sigmoid function. Explanations about 
how to obtain the membership functions can be found at [10]. 

Another feature available in FAGNIS is the statistical approach used to vali- 
date the rules extracted. There is no need for a validation set during the learning 
process of the ANN because FAGNIS uses parameter estimation and confidence 
intervals. The measure used to compute the solution found is based on the num- 
ber of parameters effectively used by the ANN: the number of linear regions and 
the number of parameters of each linear region. 

Complementing the information available for each rule we have: a linear equa- 
tion generated from a linear regression of all patterns belonging to the rule; the 




Fig. 2. Sigmoid function and the consequent part of each rule (a); implemented number 
of membership functions for a given activation unit (b) 




The Protein Folding Problem Solved by a Fuzzy Inference System 



479 



percentage of patterns covered by the rule; and the RMS error of the linear 
equation that is used as a measure of how good is the found solution. 

5 Interpreting the Results 

After training the networks until the error was stabilized, we extracted 30 rules 
for psi angle and 21 rules for phi. For each rule are reported the prototypes 
that describe the behavior of the residues (concerning HP and pi) in the input 
window. The prototypes are the main knowledge over which the interpretations 
and hypothesis will be done. 

Two approaches can be used to interpret the rules extracted. First, we can 
analyze them by the criterion of rules that aggregate the highest number of 
patterns. Second, rules with the lowest RMS error. Both approaches will be 
covered here. Due to constraints in the article size we will expose the main idea 
related to the rule interpretation and not an exhaustive enumeration of all rules. 

In the first approach the first 4 rules for psi angle include almost 50% 
(48.83%) of the database. A linear regression for all patterns reported a RMS 
error of 0.866781. None of the rules extracted has a higher RMS error than the 
one reported by this “global” linear regression. It means that each rule has a 
better prediction capacity than the solution obtained with a linear regression 
over all patterns. 

We can observe in Figure 3 the prototype that characterizes the patterns cov- 
ered by rule number 5. All patterns (i.e. windows of 13 residues) that show equiva- 
lent values for HP and pi will be handled by this rule. Concerning the influence of 
HP over the residues, can be observed a period of 2 hydrophobic residues (higher 
values) followed by 2 hydrophilic (lower ones) . Another point to be observed is the 
HP of the central residue. It has a high hydrophobic value. Concerning pi, can be 
observed the neutral characteristic of the central residue. As should be expected 
the influence of HP and pi becomes weaker in the extremes of the window. 



Fig. 3. Prototype of rule 5 where can be observed the influence of Hydropathy Index 
and Isoeletric Point over the patterns covered by its rule 



1.000 
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Prototype H P 
Pattern #1 
— Pattern #5 
Pattern #6 
— — Pattern #19 



Fig. 4. Hydropathy Index from 4 patterns with a high membership value regarding to 
rule 5 - plotted to corroborate the information given by the prototype 




Fig. 5. Threonine tRNA from Escherichia Coir, and the corresponding 13 residues 
(within the circle) with a cycle reported by the prototype that characterizes the rule 



Corroborating the knowledge given by the prototype we can plot the pat- 
terns that have the highest membership value for this rule. This information is 
obtained during the process of rule extraction. In Figure 4 we can observe the 
HP of 4 patterns with a high membership value regarding to the rule 5. We can 
observe that the patterns show a correlated behavior to the prototype. 

If we took the pattern #1, reported in Figure 3, and isolate the corresponding 
window of 13 residues in the protein, we got an alpha that belongs to Threo- 
nine tRNA from Escherichia Coli (code 1QF6 at PDB). In Figure 5 we can 
see the full protein structure and the isolated alpha at which the hydrophobic 
residues were accentuated in green. Besides the alpha portion we also exposed 
part of a beta strand to demonstrate the alignment of the hydrophobic residues 
of these two structures. The cycle reported by the prototype, concerning to the 
hydrophobicity, can be observed in the alpha. 

The second approach is to analyze the rules extracted with the lowest RMS 
error. The first rule gives a RMS error considerably low (i.e. 0.246627) but covers 
just 0.19% of the patterns. This behavior can also be noted for phi angle. The 
explanation to this fact resides in a specialization in the neural space that was 
perceived by the rule extraction algorithm and transformed in a rule. Specialized 
regions in the neural space tend to cover fewer patterns. 
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In the very same way we could extend these graphs, tables and explanations 
to other rules, to the pi attribute and to phi angle. Specifically to the pi, we can 
affirm that an interesting behavior was observed in some rules on which positive 
and negative values appear symmetrically over the prototype. 

6 Hypothesis Formulation 

The patterns used to corroborate the prototype in Figure 3 were not randomly 
chosen. They represent very important points to understand the RMS error ob- 
tained by the rules extracted and to formulate a hypothesis about the knowledge 
acquired. 

Even so the top 172 patterns from rule 5 have a membership value higher 
than 90% the ones from 1 to 20 per se carry a relevant knowledge. In Table 2 
we can analyze patterns number 1, 5, 6, 19 and 20 and observe an alternation 
between angles that lead to alpha and beta regions for patterns with very similar 
and high membership value (i.e. the degree of membership of the pattern to the 
rule). All patterns from 2 to 4 and from 7 to 18 have angles that lead to alpha 
region. 

If we extend this analysis to all patterns from rule 5 we will see what is 
plotted in Figure 6. Observing the classic Ramachandran plot we can see that 
the patterns covered by rule 5 are sparse over typical alpha and beta regions. 
This plot was generated using the expected phi and psi torsion angles. It means 
that rule 5 clusters in the input space patterns that lead to distinct regions in 
the output space. 

Table 2. Top 20 patterns covered by rule 5 ordered by membership value 



Patterns 


Membership 


phi 


psi 


1 


0.997093 


-52.09 


-51.39 


5 


0.994286 


-127.88 


161.59 


6 


0.993221 


-73.57 


-23.06 


19 


0.981746 


-118.25 


115.68 


20 


0.981539 


-64.71 


-40.96 



The knowledge embedded in Table 2 shows that both alpha and beta patterns 
have similar values for HP and pi considering the input processed window. We 
can infer that only these attributes are not enough to determine phi and psi 
torsion angles (or the secondary structure). 

To corroborate this hypothesis we analyzed the patterns processed by the 
ANN. We were able to find a knowledge already reported by Qian & Sejnowski 
[9]: the Isoleucine amino acid tends to form a beta structure (with its corre- 
sponding phi and psi angles) while Leucine tends to form alpha. When us- 
ing a traditional orthogonal representation this behavior does not introduces 
any problem, since each amino acid has a unique representation. When using a 
physical-chemical attribute this behavior leads us to a problem during the pro- 
cess of partitioning the input space. Due to the fact that Isoleucine and Leucine 
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Fig. 6. Patterns covered by rule 5 plotted using the torsion angles - X axis represents 
phi and Y axis represents psi 



have very close physical-chemical attributes values, the learning process of the 
ANN captured this information and considered both as similar entries. The rule 
extraction process captured this information and exposed it when reported the 
patterns belonging to each rule. This knowledge also explains the RMS error 
obtained. 



7 Conclusion 

In this paper we are able to explain why the use of the attributes Hydropathy 
Index and Isoeletric Point processed by a Neural Network approach are not 
enough to predict the backbone dihedral angles ( phi and psi angles). To obtain 
these results we defined and followed a methodology composed by 5 main steps: 
database definition; network training; rule extraction; results interpretation; and 
hypothesis formulation. 

We exposed the rule extraction process as a viable tool to interpret the knowl- 
edge acquired by ANNs. We can guarantee that the rule extraction process is 
statistically valid but the experimental proof of the biochemical knowledge dis- 
covered is beyond the scope of this work and should be done by qualified tech- 
nicians. 

As already stated, we choose HP and pi due to a constant reference in the lit- 
erature about their importance in the protein folding process. The same method- 
ology defined here could be used to any other physical-chemical attribute. 

In future works we plan to analyze different attributes and different input 
codifications like the common orthogonal one. 
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Abstract. In this paper we address the problem of finding an object in 
a polygonal environment as quickly as possible on average, with a team 
of mobile robots that can sense the environment. 

We show that for this problem, a trajectory that minimizes the dis- 
tance traveled may not minimize the expected value of the time to find 
the object. We prove the problem to be NP-hard by reduction, therefore, 
we propose the heuristic of a utility function. We use this utility func- 
tion to drive a greedy algorithm in a reduced search space that is able to 
explore several steps ahead without incurring too high a computational 
cost. We have implemented this algorithm and present simulation results 
for a multi-robot scheme. 



1 Introduction 

The problem of determining a good strategy to accomplish a visibility-based 
task such as environment modeling [4], pursuit-evasion [7] [8], or object finding 
[6] [15], is a very challenging an interesting research area. Specially when the 
sensors are not static but rather are carried by mobile robots. 

In this paper we address the problem of finding an object in an unknown 
location somewhere inside a polygonal environment as quickly as possible on 
average. For this, we use a team of mobile robots that can sense the environ- 
ment. This is the optimization problem of minimizing the expected value of time 
required to find the object, where time is a random variable defined by a search 
path together with the probability density function associated to the object’s lo- 
cation. The possible applications have a wide range, from finding a specific piece 
of art in a museum to search and rescue of injured people inside a building. 

We present a discrete formulation, in which we use a visibility-based decom- 
position of the polygon to convert the problem into a combinatoric one. We define 
particular locations from where the robot senses the environment (guards). The 
guards’ visibility regions are used to calculate the probability of seeing an object 
for the first time from a particular location. These are chosen from an appro- 
priate set which is computed automatically. With this, we are able to abstract 
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the problem to one of finding a path in a graph whose nodes represent the sens- 
ing locations. Trajectories are then constructed from arcs in a reduced visibility 
graph. 

We show that for this problem, a trajectory that minimizes the distance 
traveled may not minimize the expected value of the time to find the object. 
We prove the problem to be NP-hard by reduction, therefore, we propose the 
heuristic of a utility function, defined as the ratio of a gain over a cost. We use 
this utility function to drive a greedy algorithm in a reduced search space that is 
able to explore several steps ahead without incurring too high a computational 
cost. We have implemented this algorithm and present simulation results for a 
multi-robot scheme. 



2 Problem Definition 

In general terms, we define the problem of searching for an object as follows: 
Given one or more mobile robots with sensing capabilities, a completely known 
environment and an object somewhere in the world, develop a motion strategy 
for the robots to find the object in the least amount of time on average. 

The environment W is known, and modeled as a polygon that may contain 
holes (obstacles). The obstacles generate both motion and visibility constraints. 

Furthermore, we assume that the probability of the object being in any spe- 
cific point is uniformly distributed throughout the polygon’s interior. Therefore, 
the probability of the object being in any subset R C W is proportional to the 
area of R. 

We also assume that we start with a set of locations L (also known as guards, 
from the art gallery problem [9]) so that each point in W can be seen from at 
least one location in L. There are several criteria for determining the goodness 
of this set. For example, the minimal number of locations (art gallery problem), 
locations along the shortest path that covers the whole environment (shortest 
watchman path [14]), and so on. In [13] we propose an algorithm for determining 
this set. The basic idea is to place guards inside the region bounded by inflection 
rays of the aspect graph. These regions have the property that a point inside 
can see both sides of reflex vertices (those with internal angle greater than 7 r). 

The visibility region of location Lj, denoted V ( Lj ), is the set of points in 
W that have a clear line of sight to Lj (the line segment connecting them does 
not intersect the exterior of W). The set L is chosen so that the associated 
visibility regions define a cover of W. This means that their union is to the 
whole environment, that is, (J • V (Lj) = W. We do not require nor assume the 
set L to be minimal. 

For the sake of simplicity, we will first present the basic algorithm for the case 
of a single robot and then we will extend it for the general multi-robot case. 

Our exploration protocol is as follows: the robot always starts at a particular 
location in L (the starting point) and visits the other locations as time progresses 
(it follows the shortest paths between them). It only gathers information about 
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the environment (sensing) when it reaches one of these locations - it does not 
sense while moving. We describe the route followed by the robot as a series of 
locations Li k that starts with the robot’s initial location and includes every other 
location at least once. Note that while Lj refers to locations in the environment, 
L lk refers to the order in which those locations are visited. That is, the robot 
always starts at L io , and the fc-th location it visits is referred to as L ik . 

For any route R, we define the time to find the object T as the time it takes 
to go through the locations - in order - until the object is first seen. 

Our goal is to find the route that minimizes the expected value of the time 
it takes to find the object 



E[T\R] = Y i t j P(T = t j ) 
j 



where 



P{T=tj) 



Area (v (L^) \\J k<J V (L ih )) 
AreaiW ) 



(1) 

(2) 



Here, tj is the time it takes the robot to go from its initial position - through 
all locations along the route - until it reaches the j’-th visited location Lj . , 
and P (T = tj) is the probability of seeing the object for the first time from 
location Lj.. Since the robot only senses at specific locations, we also denote this 
probability of seeing the object for the first time from location as P ( L t) ) . 

Explicitly, the probability of seeing the object for the first time from a given 
location is proportional to the visibility polygon of that region V (L, . ) minus 
the already explored space up to that point (J k<j V (-^u)- 



2.1 Expected Value Vs. Worst Case 

It is important to note the difference between minimizing the expected value of 
the time to find an object and minimizing the time it would take in the worst 
case. To minimize the worst case time, the robot must find the shortest path that 
completely covers the environment (the Shortest Watchman Tour problem [1]). 
This usually means that no portions of the environment are given any priority 
over others and the rate at which new portions of the environment are seen is 
not important. 

On the other hand, to minimize the expected value of the time, the robot must 
gain probability mass of seeing the object as quickly as possible. For a uniform 
object PDF, this translates into sensing large portions of the environment as soon 
as possible, even if this means spending more time later to complete covering 
the whole environment. For non-uniform PDFs, the robot should visit the most 
promising areas first. We believe this represents another paradigm for coverage 
tasks, where it is important to gain as much new information in the shortest time 
as possible. This could be very useful in applications where the time assigned to 
the task is limited or not completely known. 
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Fig. 1 . Example with a simple environment 



The trajectories that satisfy the previous two criteria are not the same. In 
fact, for a given environment, the route that minimizes the distance traveled may 
not minimize the expected value of the time to find an object along it. Consider 
the example in Fig. 1. 

The robot starts in the corridor at location L 0 . The object will always be in 
one of two rooms, and the probability of it being in either is related to the size 
of the room. 

If the robot goes to the smaller room first and then moves on to the larger 
room (route 1), it reaches L\ at time 1 and L 2 at time 7. The expected value of 
the time it takes to find the object following this route is E[T\ (L 0 , Li, L 2 )] = 
(0.1) (1) + (0.9) (7) = 6.4. The robot always completes its search after 7 seconds. 

On the other hand, if the robot moves to the larger room first and then goes 
to the smaller room (route 2), it reaches L 2 at time 5 and L\ at time 11. The 
expected time in this case is E [Tj (Lo, L 2 , L\)\ = (0.9) (5) + (0.1)(11) = 5.6. In 
the worst case, it will take the robot 11 seconds to find the object. 

A robot following route 1 always finishes searching after 7 seconds, while a 
robot following route 2 takes 11 seconds. Route 1 minimizes the distance traveled. 
However, the average time it takes for a robot following route 1 to find the object 
is 6.4 seconds whereas for route 2 it is only 5.6 seconds. Route 2 minimizes the 
expected value of the time to find an object. 

Thus, a trajectory that is optimal in the distance traveled does not necessarily 
minimize the expected value of the time to find an object along it. 

3 Proposed Solution 

Since we assume that we are given a set of sensing locations that completely 
cover the environment, we are interested in finding an order of visiting those 
locations - the problem becomes a combinatorial search. 

In general, the robot will not be able to travel between two locations by 
following a straight line. In this cases, we use a reduced visibility graph [10] and 
Dijkstra’s Algorithm to follow the shortest path between them. 
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3.1 Reduction from an NP-Hard Problem 

The Minimum Weight Hamiltonian Path Problem, known to be NP-hard [3], 
can be reduced to the problem of finding the optimal visiting order of sensing 
locations which minimizes the expected time to find an object. 

In order to make a formal reduction, we abstract the concept of environment 
and visibility regions. We only consider a set of locations that have an associated 
probability of seeing the object and whose visibility regions do not overlap. 

The reduction consists in defining the distance between the sensing locations 
as the edge weights of the Minimum Weight Hamiltonian Path Problem and 
setting the probabilities uniformly (same value for all). 

Since the probabilities are set uniformly, the route that minimizes the ex- 
pected time will be exactly the same as the one that minimizes the distance 
traveled. This happens because the expected value of the time to find an object 
is determined only by the time it takes to reach locations along the route. Since 
time is proportional to distance, the route that minimizes time will also minimize 
the distance. 

Given that the solutions to both problems are the same ordering of locations, 
finding a polynomial algorithm to solve these instances of the defined problem 
would also solve the Minimum Weight Hamiltonian Path Problem in polynomial 
time, thereby proving that the proposed problem is NP-hard. 



3.2 Utility Heuristic 



Since trying to find an optimal solution is a futile effort, we have decided to 
implement an iterative, greedy strategy, one that tries to achieve a good result 
by exploring just a few steps ahead. 

We propose a greedy algorithm, called utility greedy, that tries to maximize a 
utility function. This function measures how convenient it is to visit a determined 
location from another, and is defined as follows: 



U(Lj,L k ) 



P(L k ) 

Time ( Lj,L k ) 



(3) 



This means that if the robot is currently in Lj , the utility of going to location 
L k is proportional to the probability of finding the object there and inversely 
proportional to the time it must invest in traveling. 

A robot using this function to determine its next destination will tend to 
prefer locations that are close and/or locations where the probability of seeing 
the object is high. Intuitively, it is convenient to follow such a strategy, but its 
relationship with the expected value minimization will be more evident after the 
following analysis. 

Consider a definition of expectation for a non-negative random variable, such 
as time, from [11] 

/»oo noo rOO 

E[T\R]= / P(T > t)dt = / (1 - P(T < t))dt = / (1 -F T )dt (4) 

Jo Jo Jo 

in which Ft is a cumulative distribution function. 
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Fig. 2. Defined cumulative distribution functions, (a) Ft (b) 1 — Ft 



In our problem, every valid trajectory R defines a particular cumulative dis- 
tribution function of finding the object, Ft- Since we are dealing with a discrete 
problem, the distributions are only piecewise continuous with the discontinuities 
being the times at which the robot reaches the distinct locations along the route, 
as shown in Fig. 2a. 

By (4), we know that the expected value of a random variable with distribu- 
tion Ft is the area under the curve 1 — Ft, shown in Fig. 2b. This area is the 
value we want to minimize. 

One method for making this area small is to have the time intervals as small 
as possible and the probability changes (down step) as large as possible. This is 
the notion that our utility function in (3) captures; its value is larger when the 
probability of seeing the object from a particular location is high (large down 
step) and/or when the location is near (small time interval). 

3.3 Efficient Utility Greedy Algorithm 

The utility function in (3) is sufficient to define a 1-step greedy algorithm. At each 
step, simply evaluate the utility function for all available locations and choose 
the one with the highest value. This algorithm has a running time of O (n 2 ). 
However, it might be convenient to explore several steps ahead instead of just 
one to try to “escape local minima” and improve the quality of the solution 
found. The downside of this idea is that it usually increases the complexity of 
the algorithm by a factor of 0(ri) for each look ahead step. 

To reduce this effect we propose a second heuristic that reduces the branching 
factor. The heuristic is that the children of each location can only be those other 
locations that are not strictly dominated according to the two variables in the 
utility function P (L k ) and Time (Lj, L k ). As seen from the j-th location Lj, a 
location Lk strictly dominates another Li if both of the following conditions are 
true 



P(L k )>P(L l ), 

Time ( Lj , L k ) < Time ( Lj,Li ) . 




490 



A. Sarmiento, R. Murrieta-Cid, and S. Hutchinson 



It is straightforward that dominating locations will lie on the convex hull of 
the remaining set of locations when plotted on the probability vs. distance plane. 
The endpoints of this partial convex hull are not considered as candidates since 
they are not defined locations. 

The final algorithm for a single robot is as follows: 

For the last location along the current solution (initially just the robot start- 
ing location) explore the possible routes (create a tree breadth-first) until 
the number of nodes is of order 0(n). 

For each node that needs to be expanded, compute the set of locations that 
are not strictly dominated by others and only choose those as children. This 
can be done with a convex hull algorithm with complexity O (n log n) . 
When the number of nodes in the exploration tree has reached order 0(?r), 
choose the best leaf according to the heuristic in (3), discard the current tree 
and start over with the best node as root. 

The complexity of the algorithm is proportional to exploring a tree of order 
O(n), choosing the best children for each node in the tree with a convex hull 
algorithm in O (nlogn) and repeating times to generate a complete route. 
This is 

O ( n ■ n log n ■ 

V fogn 

Of course, this result depends on the number of dominating locations being 
significantly smaller than n on average, which may be difficult to determine for 
a specific problem. We know, for example, that the expected number of points 
on the convex hull of a set sampled uniformly from a convex polygon is of order 
O (klogri) for a fc-sided polygon [2]. In the worst case, when the branching factor 
is not reduced at all, our algorithm only explores one step at a time and has a 
running time of 

O (n ■ n log n ■ n) = O (n 3 log n) . (5) 




(1) 

(2) 

(3) 



3.4 The General Multi-robot Case 

For the case of a single robot, each sensing location determines a state for the 
environment search. Each node in the search graph corresponds exactly to one 
location. For the case of multiple robots, the state has to be augmented to include 
the status of every robot in the team. Each robot can be performing one of two 
possible actions: sensing or traveling. Therefore, a node in the search graph 
now corresponds to a n-tuple of robot actions. We assume that two or more 
robots will never arrive to sensing locations exactly at the same time, which is 
true for sensing locations that are in general position. For example, the n-tuple 
(T 4 , 5i7, T s ) represents a state in which the first robot (first location in the tuple) 
is traveling towards location L 4 , the second robot is sensing at location Ln and 
the third robot is moving to location L 8 . 

This approach allows us to discretize time into uneven intervals bounded by 
critical events. These events correspond to the times robots reach given locations 
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(c) (d) 

Fig. 3. Simulation results for two environments and a team of two robots 



and sense the environment. This formulation defines a new state space which is 
searched with the same utility function algorithm proposed for a single robot. 

Under our scheme, the possible next states only consider the locations al- 
ready visited by all the robots, and once a robot commits to going to a certain 
location, it will not change regardless of what the other robots are doing. This 
yields a scheme which guarantees that one more location will be visited at each 
exploration step. This means that the computational complexity for a team of 
robots is exactly the same as for the single robot case established in (5). How- 
ever, this scheme will not explore all possible path permutations (which would 
be exponential), only a reduced state space. 

3.5 Probability Computation for Polygonal Environments 

We assume a uniform probability density function of the object’s location over 
the environment, consequently, the probability of seeing the object from any 
given location is proportional to the area of the visibility region from that lo- 
cation (point visibility polygon [2]). This visibility polygon can be computed in 
linear time to the number of vertices in simple polygons and in 0{n log n) time 
for general polygons [2]. If the results are cached, this has to be done only once 
for each location. 

The probability of seeing the object for the first time at location L is , denoted 
P (T = tj ), is proportional to the area visible from L. i( minus the area already 
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Fig. 4. A 3-D workspace with a mobile manipulator 

seen from locations L ik Vfc < j, as stated in (2). This involves polygon clip- 
ping operations of union and difference. We know that any clipping algorithm 
supporting two arbitrary polygons must have a complexity of at least 0(n m) 
where n and m are the number of vertices in each polygon [5] . 

The cost of performing these clipping operations must be added to the com- 
plexity in (5) to describe the total complexity of the algorithm when applied to 
general polygons. 

4 Simulation Results 

For our simulations, we implemented routines for computing visibility polygons, 
the reduced visibility graph and shortest paths (Dijkstra’s Algorithm) . For calcu- 
lating the union of visibility regions, we used the gpc library developed by Alan 
Murta based on an approach proposed in [16] . 

Fig. 3 shows the paths generated by our proposed approach for two different 
environments and a team of two robots. Parts (a) and (b) show the environments 
(black obstacles), the set of sensing locations (crosses) and the initial position 
(black circle). Parts (c) and (d) show the final paths and different regions sensed 
by the robots. The light grey path corresponds to robot 1, and the dark grey to 
robot 2. The dark grey area was only sensed by robot 1, the light grey only by 
robot 2 and the white area was seen by both. 

As can be seen, the robots split the effort of sensing the environment in 
approximately equal amounts, even when an intuitive partition is not evident, 
as in the case of the environment shown in the right column of Fig. 3. 

For the case of a single robot, the results obtained by our algorithm are close 
to the optimal case but with a major improve in computation time [12]. In some 
instances, we have observed over a thousand- fold improvement. For a team of 
multiple robots, the expected value of the time is further reduced, as expected, 
but with the exact same order of computational complexity. 

5 Conclusions and Future Work 

In this paper we proposed an efficient approach to solve the problem of finding an 
object in a polygonal environment as quickly as possible on average. We proposed 
the heuristic of a utility function to drive a greedy algorithm in a reduced search 
space that is able to explore several steps ahead without incurring too high a com- 
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putational cost. We implemented this algorithm and presented simulation results 
for a multi-robot scheme. 

In [13] we address the problem of continuous sensing for expected value search 
with a single robot. For this, we use the calculus of variations to compute locally 
optimal sub-paths and concatenate them to construct complete trajectories. As 
future work, we plan on extending the continuous sensing case to multiple robots. 

Currently, we are also working on the case where the environment is three 
dimensional and the robot is a seven degree of freedom mobile manipulator with 
an “eye-in-hand” sensor, like the one shown in Fig. 4. 
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Abstract. The real-time control proposed in this paper combines the 
event-based control theory with numeric potential field and computer 
vision techniques to teleoperate, via the Internet, a nonholonomic robot. 
The user receives visual and haptic sensory information in real time, 
allowing obstacle avoidance while navigating a remote environment. A 
numeric grid generated from a top view of the workspace is used to pro- 
duce virtual forces around the obstacles. A computer-based vision system 
recognizes landmarks at the top of the robot to obtain its current posi- 
tion. The robot position is combined with the grid to generate real-time 
haptic interaction. A graphic landmark on the workspace image repre- 
sents the predicted and current robot position. The system was tested 
experimentally to validate the approach using an Internet2 connection. 



1 Introduction 

Providing the user with sensory information in real time is always desirable to 
increase certainty when performing Internet-based teleoperation. Visual feed- 
back is common in most of the Internet-based telerobotics systems reported in 
the literature [1-3]. Recent Internet-based teleoperation applications, using an 
event-based controller design introduced in [4], successfully sent sensory infor- 
mation in real time dealing with unpredictable packet delays, lost packets, and 
disconnection [5,6]. A real-time control for Internet-based teleoperation with 
force reflection using the event-based controller was presented in [7]. This teleop- 
eration system controls a holonomic mobile robot using information from sonar 
sensors to generate virtual forces that correspond to a robot status in the remote 
environment. 

This paper proposes a real-time control for Internet-based teleoperation of a 
nonholonomic differentially-driven mobile robot using a visual sensor to provide 
sensory information in real time. The teleoperation system makes use of the vi- 
sual sensory information to generate haptic and visual feedback that corresponds 
to a robot status in the workspace. The implemented teleoperation system was 
integrated into a prior mobile robotics Virtual Laboratory (VL) framework [8] as 
another operation mode. The VL teleoperation mode allows the user to control 
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the robot to navigate a small workspace avoiding obstacles. The system combines 
a modified real-time event-based controller with computer vision and potential 
field techniques to control the robot. The user indicates the obstacles’ vertices on 
a workspace image and the system computes the robot’s C- Space. This C-Space 
is converted into a numeric potential field grid where the cell values represent 
spatial information in a similar manner as the occupancy grids [9] . This numeric 
grid defines three navigation spaces, as shown in Fig. 1. The spaces have cells 
with different probability values of being occupied by an obstacle P(obs ), 

{ 1 forbidden space 

ypy constrained space (1) 

0 free space 

where f{d) is a linear function of the distance d between the robot and the 
forbidden space (the obstacle). A computer vision system calculates the robot 
position in real time to be used as reference to perform several potential field 
measures on the numeric grid. These measures are combined to generate a re- 
pulsive virtual force around the obstacles. The system provides real-time visual 
information regarding the current robot’s position in the remote environment by 
drawing a graphic mark on the workspace images shown by the user interface. 

This document describes the proposed real-time control to teleoperate the 
nonholonomic robot, as well as the system implementation and the experimen- 
tal results. The article is organized as follows: Section 2 gives a detailed descrip- 
tion of the real-time control, Section 3 describes the system implementation and 
the experimental results using an Internet2 connection, and conclusions are pre- 
sented in Section 4. 



Obstacle's Vertices 




Fig. 1 . The numeric potential field takes as reference the C-Space 
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2 The Real-Time Control 

The numeric potential field is set when the user indicates the physical obstacle 
vertices on a top view image of the workspace. These vertices are used to compute 
a C-Space. Then the C-Space is converted into a numeric potential field grid, 
which is used to generate a repulsive virtual force when the robot approaches 
to an obstacle. The virtual force vector is obtained based on the proximity and 
direction to the obstacles, which are calculated from several potential field mea- 
sures taken from the numeric grid using as reference the current robot’s position 
obtained from the computer vision system. This virtual force is converted to a 
velocity value and introduced into the real-time control that generates a velocity 
tracking error. This tracking error is fed to the user as a force feedback to indi- 
cate the proximity and direction to the obstacles. Furthermore, a graphic mark 
is also drawn in real time on the image to indicate the robot’s position. 

2.1 System Model 

The system architecture on the Internet was based on two elements: the first el- 
ement, called Guest, is a computer connected to the Internet containing one or 
more user interfaces that allow the user teleoperate the slave device; a Khepera 
minirobot [10] . In contrast, the other element called Host, is a computer connected 
to the Internet containing the necessary applications to remotely operate the slave 
device. The Guest presents a user interface that shows a workspace image and is 
connected to a joystick. The robot and a camera are connected to the Host, as 
shown in Fig. 2. Each block will be described in detail and all variables are defined 
in Table 1 and referenced to s, where s is the event, s € {l,2,..n}, and represents 
the number of commands issued by the user. The stability and synchronization 
of the system result from the use of a non-time based variable s as reference [11]. 

Image Acquisition. A top view of the workspace is acquired by the camera 
and converted into a discrete image using a video frame grabber. This process 
generates a matrix /(s) of integer elements using a 256 level grayscale. The 
matrix /(s + n) is compressed using the Run Length Encoding (RLE) method 
before be sent to the potential field creation procedure. 

Potential Field Creation. The image matrix J(s + n) is decompressed and dis- 
played on the user interface. Next, the user interacts with the user interface to de- 
fine a C-Space based on the workspace image. Using the computer mouse, the user 
indicates the obstacles’ vertices on the image and the system calculates and draws 
a convex polygon that is expanded in relation with the robot radius, to build a C- 
Obstacle. This procedure is carried out m times, once per each physical obstacle. 
The system then converts the displayed C-Space bitmap into a numeric matrix M 
with the same size than I(s), where the obstacles are expanded in a decremented 
way to create a numeric potential field. The numeric matrix M is calculated once. 

Image Processing. The matrix /(s) is binarized and segmented to find ev- 
ery object in the image. Then the objects are characterized using Hu invariant 
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Fig. 2. Block diagram of the real-time control for Internet-based teleoperation 



Table 1. Definition of the Variables used in the Block Diagram 



Variable Definition 


Concept 


Fm(s) = [Fmx(s)F rny (s)] 1 ' 
-X-m(s) = [•^ma;(^)-^mj/(s)] 

Vrf(s) — \Vdleft (s) Vdright (s)] 
Va(s) — \Valeft(F)Xaright{^s)\ 
Vm(s) = [V rn l e ft(s)V rnr ight(s)] T 
E(s) = [E x (s)E y (s)] T 

[F)V er ight (^)] 
p k {s) = [x( s )y( s )^( s )] T 


Applied force by Joystick 
Joystick position 
Desired velocity 
Applied velocity 
Measured velocity 
Tracking error 
Potential field velocity 
Robot position 



moments to identify a robot’s artificial landmark. Two circles compose the 
robot’s landmark: the bigger circle represents the robot’s front, while the small 
one indicates the rear. The robot’s position Pk{s) is the centroid of the big circle. 
The robot orientation is the angle defined by a line passing through the centroids 
of both circles and the a:-axis of the reference frame. 

Operator and Joystick. The joystick and the operator have a spring-like be- 
havior [12]; the operator generates new joystick positions to compensate the dis- 
placements generated by the force feedbacks, according to the following function: 

A m(s) = ^ , (2) 

where K m is a scaling constant, the new joystick position is X m (s) and the previ- 
ously played force is F m (s— 1). Take notice that the rotational component X m g(s) 
is not used, since the joystick is not able to feedback force in that direction. 

Joystick Controller. The position X m (s ) is proportionally converted to the 
desired velocity Vd(s) for both left and right robot motors allow for differential 
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Table 2. Equations for Joystick Position Conversion to Desired Velocity 



Xmx (s) , Xmy (^) 


Vdleft 


Vdright 


l X — Y (a} 


Xmy ( s ) 


y Xmx (s) 


1 X\-mx v,* ) 1 -X-my V* ) 


K p 


K p 




X-rny ( s ) 


Xmy{s)+ X ^ 3) 


1 X-mx J 5 t -Xmy \t> ) 


K p 


Kp 


_ Y —X (s') 


Xmy(s)+ X ™ x(s) 


Xyny ( s ) 




k p v 


K p 


— X (o\ -\_x ( 


V f Xmx ( s ) 

my 


X-rny ( s ) 


X-mx 5 i -Xmy \& ) 


Kp 


K p 


+^ms( 5 ), 0 


0 


X-mx ( s ) 
Kp x K x 


Xmx (s) , 0 


Xmx ( s ) 
Kp x K x 


0 



drive. The X m (s) and equations in Table 2 are used to obtain the velocity Vd(s). 
The equations in the four first rows use K x to decrease the component X mx (s) 
to generate a hyperbolic speed function on the x-axis of the joystick reference 
frame. I\ p is a scaling constant, e.g. I\ x =3, A' p =100. 

Proximity and Direction Calculation. Potential field measurements are per- 
formed taking as reference the robot position Pfc(s). These measures are 
treated as proximity sensor readings to generate virtual forces which are converted 
to velocity values to be fed into the real-time control. A punctual measure over 
the numeric potential field is called virtual sensor M n {s). Six virtual sensors are 
distributed around the robot’s front (as the real ones) and another six are placed 
around the robot’s rear. The virtual sensors make proximity measures on the nu- 
meric potential field grid around the obstacles. These measures are condensed by 
equations 3 and 4 in two variables: proximity P r (s) and direction </>(s) [13]: 

p r(s) = [ J, (3) 

lx s 

n = , 90(M 5 QO - M 0 (s)) + 45 (M 4 (a) - M^s)) + 5 (M 3 (a) - M 2 (s)) 

9 ( 1 + £<=o Mi(S)) 

The variable P r (s) is calculated based on the maximum proximity measure 
which is divided by a scaling constant K s . The variable <j>(s) is computed sub- 
tracting the opposite measures of the virtual sensors from the corresponding 
left or right sides, therefore a positive or negative sign is obtained indicating the 
left or right side of the robot’s front/rear. The results from the subtraction are 
multiplied by weights corresponding to the location of the virtual sensors. The 
<p(s) value is obtained summing the weighted subtraction results. Then <j)(s) is 
divided by a summation to normalize the results into a —10 < 0 < +10 range. 
The P r (s ) and 4>(s) behavior is described in detail by [13]. 

When the robot performs a forward displacement, the six frontal virtual sen- 
sors are taken into account to calculate P r {s) and </>(s) and generate velocity 
V e (s) according to the potential field. When a backward displacement is per- 
formed the six rear virtual sensors are taken into account to calculate P r {s) and 
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Table 3. Equations for Proximity Conversion to Potential Field Velocity 



f(s) 


Veleft 


Veright 


f(s) = 0 
</>(s) > 0 
0(s) < 0 


Pr(s)*V dleft (s) 


P r (s) X Vdright ( s ) 


K e 

0 

p r x V d ie ft (s) x cos(- 100(a)) 


K e 

Pr(s) X Vdright ( s ) X cos(10<£(s)) 


K e 

0 


Ke 



c/)(s) and generate V e (s). The velocity V e (s) is calculated based on the equations 
on Table 3. In these equations, P r (s) represents the virtual force vector magni- 
tude and (f>{s ) is used to obtain the P r {s) vector projection, on the corresponding 
axis, to generate the velocity V e (s) on both left and right motors, using K e as 
a scaling constant. After that, the calculated V e (s ) is subtracted to Vd(s ) to 
produce an applied velocity V a (s) to the robot: 

V a (s) = V d (s)-V e (s). (5) 

Conversion to Robot Command. The velocity V 0 (s) is converted into a com- 
mand string recognized by the robot speed mode [10], with the corresponding 
parameters to set both left and right wheel motor speeds ( V a i e ft(s ), V ar ight(s))- 

Slave (Mobile Robot). The robot receives the speed command string and 
executes and holds the command for a specific time frame, then stops and waits 
for the next command. The speed commands set each wheel motor speed through 
the robot PID controller and a PWM (Pulse Width Modulation) generator [10]. 

Speed Acquisition. Once applied the velocity V a (s) to the robot, the speed 
acquisition procedure requests the instantaneous speed [10] of both wheel motors 
( Vmieft(s),V mr i g ht(s )), to be sent as the measured velocity V m (s) to the Host 
system. Vd(s ) is taken as a reference, the difference between Vd(s) and V m {s) 
generates a tracking error E(s) that is converted to a force vector F m {s) by the 
joystick controller. 

E{s) = V d {s) - V m {s). (6) 

The joystick controller converts the error E(s) into the F m (s) applying the 
inverse procedure of Table 2 equations to obtain the error projection E(s) = 
[E x (s)E v (s)] t on the x and y axes. This vector E(s) is added to the last applied 
force F m (s — 1). The resulting vector F^s) is multiplied by a scaling constant 
Kf to obtain the current force vector F m (s). Then, the F m (s ) vector is played 
by the joystick during a specific time frame: 



F^(s) = F m (s-l) + E(s), 


(7) 


F m (s ) = K f F' m (s). 


(8) 



Internet. The real-time control takes the communication link as a delay element 
that plays no role in the control model [7] . 
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Fig. 3. Teleoperation system configuration to navigate a small workspace 



3 System Implementation 

This section describes the experimental setup of the teleoperation system, which 
is complemented with an independent observer system described in [8] . All user 
interfaces were run in a Guest computer and the server applications were ex- 
ecuted in two Host computers: one computer controlled the robot, and the 
other served as the observer. Fig. 3 shows the experimental configuration. The 
workspace, the Guest and the Host computers have the same setup described in 
[8]. The robot motor speeds ranged from ±10 pulses/10 ms and are held for 90 
ms. On the other side, the force feedback is provided by the joystick for 250 ms. 
The joystick is a MS Sidewinder FF2 connected to a USB (Universal Serial Bus) 
port of the Guest computer. The joystick controller was programmed in C++ 
including MS DirectX libraries, and it communicates via TCP (Transport Con- 
trol Protocol) with the Guest application written in Java. Left Fig. 4 shows the 
teleoperation interfaces, which have drawn the C-Space used in the experimen- 
tation, and the graphic robot mark that represents the current robot position. 
The graphic mark is a triangle that shows the robot’s direction and position. 
The triangle is drawn inside a dotted circle. These dots represent the virtual 
sensors around the robot and indicate its position. A dotted line is drawn indi- 
cating the predicted position of the robot based on the last applied velocity. The 
communication link is performed using TCP for the teleoperation system. The 
number of compressed images sent by the teleoperation Host computer is reduced 
to avoid unnecessary delays introduced by the image transmission through the 
Internet. An image is sent to the Guest every n events (operator commands). 
Consequently, every s + n event an image is transmitted and displayed. The 
compressed images range from 16 to 19 Kbytes. 



Internet-Based Teleoperation Control 501 



4 Experimental Results 

This section describes one of several performed experiments. The results were 
acquired using an Internet2 connection with 10 jumps in the route, according 
traceroute’s output. The Guest was located at CIBNOR (Centro de Investiga- 
ciones Biologicas del Noroeste La Paz, Mexico) 1,050 km away from the Host 
system, which was situated in the Robotics and Vision Laboratory of the CSI 
(Centro de Sistemas Inteligentes, ITESM campus Monterrey, Mexico). Right 
Fig. 4 shows the robot’s trajectory on the (x,y) plane; it is 94.710 cm long. 
The potential field around the obstacles is represented by boundary lines and 
the virtual sensor positions during the trajectory are indicated by the scattered 
points. The trajectory plot has a resolution of 0.30625 cm per point. The tra- 
jectory was performed in 163.445 sec. The average time for a complete system 
cycle was 0.9080 sec. Fig. 5 presents the experiment behavior; the plots are ref- 
erenced to the event s; 180 events (commands) were performed. The first row 
presents the desired velocity Va(s) in pulses/10 ms versus s, for the left and 
right motors of the robot. The second row shows the sum of the measured ve- 
locities V m (s) and the corresponding y-axis force F my (s) versus s. It is clear 
that the plots are similar, considering that the sum of the measured velocities 
and the force felt in the y-axis should behave as the desired velocities. The 
third row illustrates the force applied by the joystick in both x and y axes. The 
desired velocity V/(s) is decremented according to P r (s) and <fi(s) which are 
computed using the measures of the virtual sensors on matrix M. If an obsta- 
cle is near the robot’s front, then both left and right motor velocities decrease, 
an inverse force vector increases in the y-axis of the joystick, the robot’s rear 
has the same behavior. If an obstacle is near the right side of the robot’s front, 
then the right motor velocity decreases, but a negative force in the x and y 
axes of the joystick increase; indicating that an obstacle is near in that direc- 
tion, and for the left side, a negative force in the y-axis and a positive force 
in the x-axis is generated, and vice versa for the robot’s rear. Thus a closer 




Fig. 4. Left figure presents the user Interfaces for teleoperation showed in the Guest 
computer. Right figure shows the robot’s trajectory on the ( x , y) plane 
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Fig. 5. The system behavior during a remote experiment 



obstacle causes the tracking error to increase, and this error is proportionally 
converted to a force vector by the joystick, to tell the user that an obstacle is 
present in the robot trajectory. If the robot is located at 3.675 cm from the 
obstacle, then the left and right motors stop to avoid collision. When the vir- 
tual sensors have not detected any obstacle around the robot, the measured 
velocity starts tracking the desired one, and both the tracking error and force 
vector decrease. The tracking error variation is influenced by the tracking error 
of the robot’s PID controller, by the friction of the surface, and by the weight 
of the serial cable attached to the robot. The most significance peaks shown 
on the y - axis force plot are due to the presence of obstacles in the robot 
trajectory. 



5 Conclusions 

This paper presented a real-time control system for Internet-based teleoperation 
of nonlrolonomic differentially-driven robots. The described control model gen- 
erates real-time sensory information using potential field and computer vision 
techniques, in conjunction with an event-based controller. The implemented sys- 
tem allows to teleoperate a mobile minirobot and provides the user with haptic 
and visual feedback in real time that enhances the control experience by making 
it more natural and intuitive. Results obtained using an Internet2 connection 
validate the design of this control model. 
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Abstract. We address the problem of the development of representa- 
tions by an agent and its relationship to the environment. A software 
agent develops a representation of its environment through a network, 
which captures and integrates the relationships between agent and en- 
vironment through a closure mechanism. A variable behavior modifier 
improves the representation development. We report the preliminary re- 
sults where we analyze two aspects: 1) The structural properties of the 
resulting representation can be used as indicators of the knowledge as- 
similated by the agent from the interaction with the environment. These 
properties can be taken as useful macrovariables from an objective point 
of view; and 2) The dynamics of the closure mechanism, can be seen 
as the internal, and therefore subjective, way used by the system to 
develop its representation. We are not interested only on how the mech- 
anism functions, but also on how the representation evolves. 

Keywords: Closure, representation development, behavior modifiers, af- 
fective states, biological motivations. 



1 Introduction 

Any cognitive agent must be able to start, distinguish, improve, and represent 
actions in an autonomous manner. Language can be seen as intersubjective ac- 
tions and not just a code-decode system, therefore language can be seen, also, as 
an embedded use-action activity with an implicit meaning. How can actions be 
represented internally, how can they be evolved, discovered and improved? An- 
swers to these questions can be seen as part of a process under a constructivist or 
epigenetic approach. Under this approach, embodiment, social interaction, de- 
velopment and integration are important specific topics, as Brooks has pointed 
[1]. Embodiment is a property any cognitive agent must have. It is associated 
with the representation of the world, the body of the agent interacting with the 
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environment outlines, influences, limits those representations. Some Artificial In- 
telligence researchers have recognized the need of representations to have a truly 
situated and cognitive agent [14]. Mitchell has pointed the need to integrate the 
dynamical and state approaches to representation also has to be related with 
the dynamics of the system agent-environment [8] . Thus, we are interested in the 
dynamics of the interactions generating representations, and therefore creating 
structure. Ziemke [19] has proposed five notions of embodiment. Each of them 
has distinctive significance and relevance for the epigenetic phenomena. We are 
interested in the organismic embodiment of autopoietic living systems, which is 
based on the idea that cognition is what living systems do when they interact 
with their environment (and other organisms) [7], [16]. This kind of embodiment 
includes development in all biological ways of organization. Obviously, there is 
a clear difference between living organisms, which are autonomous and autopoi- 
etic, and man-made machines, which are heteronomous and allopoietic. So, how 
can an artificial autopoietic system be built? 

Steels has proposed that representations can be explicit (or symbolic) but 
also implicit (or emergent) [14]. Zlatev and Balkenius [18] have proposed that 
any epigenetic robot must have the ability to generate representations, but it 
is not clear what internal representations are and how are they developed in 
living systems. We assume that any external representation requires an internal 
representation to acquire meaning. We think that internal representations are the 
intertwined relationships between significant perceptions and significant actions 
of an embodied agent interacting within an environment. 

With this background, in this work we try to relate the dynamic and 
structural descriptions in representation development. The paper is organized 
as follows: Section 2 defines the pragmatic games used to explore representation 
emergence, describes how the pragmatic games were implemented and which are 
the components of the software agent. Section 3 defines the concepts of closure, 
closure states and describes how they are generated. It also defines behavior 
modulators and presents the type of behavior modulator: focus, used to improve 
representation development. In section 4 we present two perspectives to analize 
the experiments carried on: internal description and the external description. At 
last, section 5 closes the paper with discussion and future work. 



2 Pragmatic Games and Experimental Setup 

To develop and analyze representations, we need a setup in which the agent 
copes with enough similar situations to construct knowledge, as Piaget thought 
[9]. The easiest way to do this is by interacting within an environment, one sim- 
ple enough to provide similar (but not identical) conditions. “Pragmatic games” 
can be played to achieve this scenario: every time an event occurs, the scenario 
restarts with similar conditions. The term “pragmatic games” is inspired on “lan- 
guage games” [13], but we used them, also, as a methodology to study epigenetic 
development. The agent can be carried out pragmatic games as a result of its 
capabilities and the environment’s characteristics. It means that the agent has 
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the possibility to play the game and complete it with no more than the inborn 
capabilities. The agent can move allowing errors. 

The pragmatic game used to contrast our ideas is the “feeding game” , a subset 
of the micro- world used by Drescher [5] to study Piaget’s Schemes. This involves 
a 2D 7x7-grid world in which there is an agent consisting of one 5 x 5-grid “eye” 
with a central lxl square “fovea” , a 1 x 1 “hand” , and a 1 x 1 “mouth” . Within 
the world “objects” of size lxl can exist. The agent has four independent 
actuators, to move its hand and eye in the two spatial dimensions. The eye’s 
movements are restricted to focusing of the fovea within the world. The hand 
has the same constraint. The displacements are discrete and equal in length side 
of one cell, in such a way that the eye and the hand always occupy complete 
world cells. Each of the four actuators chooses randomly among three possible 
options: decrease, maintain, or increase (-1, 0, or 1) the actual positions of the eye 
and hand in both dimensions {e x ,e y ,h x ,h y ), constrained by the environment. 
Each value has equally probability to occur. An actuation would be a set of four 
values of the actuators. In the environment, an object can be placed randomly 
anywhere. If the hand passes over the object, it will be attached to the hand. 
If the hand holding the object passes over the mouth, the object is ’’fed”, and 
a new object appears at a random location, and the game starts again. Each 
one of the 25 cells of the eye senses the colors R, G, or B of the objects in the 
visual field, sending 3 bits to the agent, one for each color. Therefore, the eye’s 
sensing signal consists of 75 bits. The “hand” and the “mouth” contribute to the 
total sensing vector with one bit each if their position coincides with an object. 
The agent has no proprioception, in the sense that it has no register of the 
relative position of its hand, eye, nor mouth. In addition to the sensing states, 
we define a set of distinguishable innate biological motivations, the incoming 
information consisting of a 5-bit vector: Three bits for the fovea, each one for 
detecting R, G or B, one for the “hand” and other for the “mouth”. Therefore, 
there are potentially 32 different biological motivations, although in our simple 
simulations less than ten are bootstrapped. They only are distinguishable and 
at the beginning they are not related to any sensorial state. 



3 Closure Mechanism 

We used the small-world networks theory to study the dynamics and structural 
properties of the resulting network. The network will be strongly related to the 
particular choice of the closure mechanism because this affects how nodes and 
arcs are introduced to the network [15]. We consider a closure mechanism as a 
mechanism that favors probabilistically category formation]] [6]. 

In our experiments, the initial representation network is empty. The agent 
has as inputs the sensing states and biological motivations. Every time the agent 
experiences a particular biological motivation, a record is created saving the 
sensing states associated with it. 

Any distinguishable situation is considered as a signal and the agent has to 
incorporate it in the representation. In a simplified way, a process is closed if 
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an actuation is related with the signals and the signals are related with the 
actuation. In this sense, the closure mechanism must be a process trying to 
introduce relevant signals and actuations in the representation and trying to 
identify if they are or not related. In our directed graph, the nodes will have the 
signal information and the arcs the actuation information. 

The closure mechanism will create and incorporate relevant nodes and arcs, 
modifying their status. When it reaches a class of “well formed” links between 
nodes, they will be called facts , having some relation with the schemas of Drescher 
[5] but constructed with different criteria and motivations 1 . 

Nodes which are affective states [11] are added by the agent to develop the 
representation. Affective states represent any state of the agent that could affect 
it, for better or worse. For our aim, this distinguishable character is indispensable 
to filter information from the world and to establish organization, measurements 
does not have this characteristic. Biological motivations are considered as affec- 
tive states, since they are distinguishable. Under this point of view, sensing 
states are not affective states , because sensing has no relevance to the agent. 
The importance of a sensing state requires to be captured into a representation 
in order to acquire some value (relative to the agent). 

A sensing state can become into an affective state if it becomes to associate 
with some value. An affective state can be seen as a signal. These can become 
related through actuations , enabling the agent to develop a structured represen- 
tation in an autonomous way. The relationship between signals and actuations 
is structured even when the actuations are random. This structure is reflected 
in two ways: internally, during the closure mechanism , and externally, analyzing 
the network properties. 

After a certain number of iterations, the system “falls” in a process trying 
to determine if the biological motivations have some specific associated sensing 
state. Therefore the sensing bits always present when the biological motivation 
has been experienced are represented in the affective state , and; if the sensing 
state at time t corresponds to an affective state, then a node corresponding to 
the sensing state in the time t — 1 is incorporated in the representation, as well 
the directed arc between the nodes (representing the actuation). The new node is 
called potential affective state. A potential affective state becomes affective state 
if its frequency exceeds some value 2 . The relationships between nodes (arcs) can 
be incorporated in the representation in two ways: When a potential affective 
state is created, and; When the agent experiences two sensing states have been 
associated with existing nodes (affective states or potential affective states) in 
the representation. 



1 For Drescher a functional relationship is searched, being reliability the last test. For 
us, only a significant departing from randomness in actuations establishes a deep 
relationship between affective states, stressing a non functional relationship between 
them. 

2 If the values are too small, noise can be learned. If the values are too big, then it 
takes more time to learn. This also happens with other parameters of the model. 



508 



C.R. de la Mora B., C. Gershenson, and V.A. Garcia- Vega 



Every time an arc is crossed, the frequency and the performed actuation 
are recorded in the arc. Once the arc exists in the network, its status can be 
modified with the recorded information of the actuations during the process in 
the following way: 1) If the frequency of occurrence in experiencing a specific 
arc is larger than a given value, it becomes a frequent arc and the probability 
distributions of the actuators are computed from the history and saved in the arc 
in the form: {{p{e x = -l),p(e x = 0 ),p(e x = 1)}, {p{e y = -l),p(e y = 0 ),p{e y = 
!)}, {p{h x = -1 ),p(h x = 0),p{h x = 1)}, {p(h y = -1 ),p(h y = 0 ),p{hy = 1)}}; 2) 
An arc is considered codifiable if one of the 12 probabilities of a frequent arc is 
greater than a threshold, since the nodes joining the arc have more than a random 
link, as the movement could be codified for at least one actuator. If a codifiable 
arc has affective states as source and target nodes, it will be called a fact. This 
is the most refined state for the closure mechanism in the development of the 
arcs. This contrasts with the Drescher’s perspective, which considers reliability 
as the way to verify the arc’s functionality. 

Our method is imperfect associating nodes in a strict causal way, but has 
an advantage: it has no intention into reach specific nodes or to proof specific 
arcs, avoiding any “cognitive” consideration. The representation is developed 
through the imperfect agent’s possibilities and not under our preconceived “true 
or false” considerations. This is because we are interested in the way the closure 
process occurs and not in its success as being “the best” fact constructor. We 
stress that the important aspect is the network formed by facts , and that the 
closure mechanism does not stop when an arc has achieved the fact character, 
but continuously incorporates new arcs and nodes (which can become facts). 



3.1 Closure States 

To identify the agent’s closure state associated to every pair of consecutive sens- 
ing states , we use a set of three values (according to Table 3.1), one for the 
sensing in the time t — 1 , other for the sensing in the time t, and the third for 
the state of the link between them. 



Table 1. Codes of closure states 



value 


node 


arc 


0 


not in representation 


not in representation 


1 


potential affective state 


not frequent 


2 


affective state 


frequent 


3 


- 


codifiable 



The closure state has 3 x 3 x 4 = 36 possibilities. For the specified closure 
mechanism, the state 223 has the highest “closure degree” and corresponds to 
a “closed” arc or fact. The particular closure’s state is a subjective appreciation 
for the agent, in the sense that it does not tell anything to an observer who does 
not have precise knowledge of the mechanism. 
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3.2 Behavior Modulators 

We are interested in the emotional modulation of cognition [3,4]. In this sense, 
emotions are the observable result of a particular set of values for the behavior 
modulators. We will consider a humble approach to Dorner’s theory [4] to test 
the idea that modifying the behavior according to the actual knowledge state, 
closure state of an agent can help obtain more knowledge. We use a single be- 
havior modulator, called focus. This parameter, with values between 0 and 1, 
modulates the probability to revisit the previous sensing state, by undoing the 
last performed movements. It is called focus because it is a mechanism used to 
perceive again something by trying (with a probabilistic measure) to revisit a 
situation. A focus with value 0.0 means that the agent will always move in a 
random direction. A focus with value 1.0 means that the agent will always undo 
the last movement. The different focus values can be interpreted by an observer 
as different emotional states. For example a high focus value could be seen as 
“interest” . 



4 Experiments with Internal Representation 

The experiments will be described from two perspectives, one internal corre- 
sponding to the closure mechanism’s dynamics trying to incorporate signals and 
actuations, and other external corresponding to a network’s quantifiers as being 
macro variables. We are interested in the way the focus affects the representa- 
tion’s development given the system and the agent’s closure mechanism. Every 
time the system changes closure state, either by incorporating a node, an arc, 
or changing their status in the representation, the agent has integrated more 
knowledge from the environment and its interrelationship, captured in the net- 
work. 



4.1 Closure Dynamics with Constant Focus 

In a first set of experiments, we perform runs of the feeding game, varying 
the focus value. All agent’s movements are random, but when focus value > 0, 
the last movement can be undone. Closure dynamics or knowledge incorporation 
process builds a probabilistic network, where closure states/nodes and transitions 
between them (arcs) are weighted by the ocurrence frequency. 

Two types of arcs are recognized: loops and simple-transitions. Loops repre- 
sent time without knowledge acquisition, Table 4.1 shows some loops’ relative 
frequencies for each focus. As the last row of the table shows, the agent is en- 
gaged in loops in more than the 85% of the total time. During loops, there is no 
distinguishable change in the closure mechanism. The number of loops changes 
according to the focus value; a lower loop frequency indicates less time without 
changes in the representation, learning faster. The focus plays a different role in 
learning depending on the specific loop. There is no “best” focus value, but the 
“optimal” value depends on the actual (internal) context. 
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Table 2. Relative frequencies of loops (=0.01) 





Focus value 


loops 


0 


0.25 


0.5 


0.75 


var 


222-222 


0.31 


0.26 


0.25 


0.23 


0.17 


000-000 


0.12 


0.15 


0.16 


0.24 


0.09 


223-223 


0.09 


0.09 


0.12 


0.14 


0.19 


221-221 


0.08 


0.06 


0.07 


0.08 


0.14 


121-121 


0.05 


0.06 


0.06 


0.05 


0.09 


111-111 


0.05 


0.05 


0.04 


0.03 


0.01 


211-211 


0.05 


0.05 


0.05 


0.04 


0.07 


100-100 


0.04 


0.05 


0.05 


0.04 


0.03 


010-010 


0.04 


0.05 


0.05 


0.04 


0.02 


200-200 


0.03 


0.02 


0.02 


0.01 


0.02 


total 


0.86 


0.85 


0.87 


0.91 


0.83 



When the system experiences a change in the closure state, we can say that 
the system has incorporated structure in the representation. Table 4.1 shows 
the relative frequencies. The total time the agent develops its representation 
(transitions) is lower than the time devoted to loops. 



Table 3. Relative frequencies of transitions (=0.01) 





Focus value 


transitions 


0 


0.25 


0.5 


0.75 


var 


110-111 


0.05 


0.05 


0.04 


0.01 


0.03 


020-021 


0.03 


0.03 


0.02 


0.01 


0.03 


210-211 


0.02 


0.03 


0.03 


0.02 


0.04 


120-121 


0.01 


0.01 


0.02 


0.01 


0.02 


221-223 


0.01 


0.01 


0.01 


0.01 


0.02 


220-221 


0.01 


0.01 


0.01 


0.01 


0.02 


121-223 


0.00 


0.00 


0.00 


0.00 


0.01 


total 


0.14 


0.15 


0.87 


0.91 


0.83 



4.2 Taking Advantage of the Closure’s Dynamics: Variable Focus 

After analyzing the probabilistic networks corresponding to the closure 
structures obtained with each of the focus values. We repeated the feed game 
experiment but changing the focus value in function of the actual closure state 
according to the following rule: If closureState£ {222, 221, 211, 121, 212, 223, 213} 
then focus = 0.66 else focus = 0.0. The system changes its focus value in a reac- 
tive way, depending only on the current closure state, not on the sensing states. 
The goal is not to find the “best” focus value for each closure state, but just to 
show that modifying the behavior modulator in terms of the closure state affects 
the acquired knowledge, obtaining a different structure of the global representa- 
tion. The results are shown in the last column of Table 4.1 and Table 4.1. We 
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can see that the loops in which the agent spends more time decrease sensibly, 
and that the transition frequency rise or are maintained except for the 110-111 
case. A detailed analysis of the data shows that the paths to the state 223 (facts) 
are favored. 

4.3 External Description: Network Properties 

Both loops and transitions as considered before are characteristic of the closure 
mechanism, reflecting the dynamics during the development of the network. We 
can calculate the closure state distribution obtained from the resulting repre- 
sentation considering all the existing arcs and their associated nodes, as shown 
in Table 4.3. This distribution can be considered as an external observation, be- 
cause it is a “picture” of the representation at a certain time, but does not give 
information on how the arcs have obtained their closure state. We can observe 
again that the focus affects the closure state of arcs in different ways. For facts, 
there is no significant variation with fixed focus values. But their frequency is 
increased in an important way (sa 50%) with variable focus. 



Table 4. Closure state of arcs of final representation 



Focus 



arcs 


0 


0.25 


0.5 


0.75 


var 


121 


586 


539 


503 


243 




586 




221 


386 


338 


271 


272 




409 




211 


264 


341 


364 


191 




447 




Fact: 223 


213 




188 




204 


21l| 




307 




111 


370 




477 




337 


97 


196 


222 


59 


59 


43 


43 


94 


213 


6 


1 


8 


4 


6 


212 


1 


1 


2 


0 


0 


mim arcs 


1885 


1944 


1732 


1061 


2044 



We used the Clustering Coefficient [17] of the representation network as a 
measure of global structure. The clustering coefficient shows that the more ef- 
ficient and more stable case is the one of variable focus. The variable behavior 
modulator actuates internally to produce improvements in the external struc- 
ture. In our model, the sensed turns into signal internally through the closure 
process, and the result can be measured externally in the structural properties of 
the representation network. We also analyzed the subnets related to each biolog- 
ical motivation (data not shown). Each subnet is similar to a scheme, more in the 
Piagetian sense than in Drescher’s sense. This is because the subnet corresponds 
to structured knowledge with some biological meaning and not to concrete con- 
text - actuation-result detection. The structural properties of subnets are better 
with a variable focus than with a fixed one. There are also more subnets with a 
variable focus, giving the possibility to develop more schemes. 
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5 Conclusions and Future Work 

In this work, we try to isolate a representation’s evolving process, avoiding any 
use of “cognition” about the “state of the world”, to explore two aspects: 1) A 
case study on how the internal and external modes of description can be related, 
and; 2) How behavior modifiers based on internal considerations can affect the 
structural properties of the developed representation. The closure mechanism 
is in itself a knowledge acquisition mechanism in the sense that it incorpo- 
rates in the representation the structural relationships with the environment. 
Using a behavior modulator called focus, the representation and its structure 
can develop in different ways. A selective value for specific closure states im- 
proves the structural properties of the representation (external mode of descrip- 
tion) . 

Our model is not a behavior-based nor knowledge based. It is atypical for a 
“knowledge acquisition mechanism”, since the agent does not react directly to 
its world. However, the focus can modify the behavior patterns. The variable 
focus allows the agent to react to its knowledge state in order to incorporate 
faster the relationships with its environment. The obtained representation does 
not catch structural properties of the environment, but makes explicit the struc- 
tural interactions between the agent and environment. We have avoided any use 
of the representation but these have a potentiality to be used. Only structural or 
historical embodiment is not enough for obtaining autonomously rich represen- 
tations. It seems the same as to think about only affordances. We need consider 
also an internal process independent to the world’s dynamics , in such a way that 
the representation becomes richer. Note that the mentioned dynamics is differ- 
ent from the related with the use of representations. In the agent’s life, both 
dynamics are crucial, and must be related, but at this moment we are concen- 
trated in building representations. Our work can be seen as a kind of learning 
representation without explicit goals, i.e., an implicit learning process. 

As a future work, we can see several directions which could be followed. In- 
tersubjective representations could be obtained by pragmatic games in which 
two or more agents interact with an environment. This topic is interesting for 
studies in the evolution of communication. Another direction would be to study 
the effect of different behavior modifiers in the development of the representa- 
tion. 
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Abstract. A probabilistic roadmap is a network of simple paths con- 
necting collision-free configurations obtained by sampling a robot’s con- 
figuration space at random. Several probabilistic roadmap planners have 
solved unusually difficult path planning problems, but their efficiency re- 
mains disappointing when the free space contains narrow passages. This 
paper provides a new technique to find free configurations into narrow 
corridors, sampling the configuration space using geometric features into 
the workspace and computing configurations close to the obstacles. An 
initial roadmap is built using spheres in low cost, next an improving con- 
nectivity phase based on “straightness”, “volume” and “normal vectors” 
features on the workspace is computed, and the roadmap is improved 
capturing a better connectivity of configuration space. Experiments show 
that the new approach is able to solve different benchmarks in motion 
planning problems containing difficult narrow corridors. 

Keywords: Probabilistic roadmap methods, geometric characteristics, 
robotics. 



1 Introduction 

Motion planning in the presence of obstacles is an important problem in robotics 
with applications in other areas, such as simulation and computer aided design. 
While complete motion planning algorithms do exist, they are rarely used in 
practice since they are computational infeasible in all but the simplest cases. 
For this reason probabilistic methods has been developed. In particular, sev- 
eral algorithms, known collectively as probabilistic roadmap planners, have been 
shown to perform well in a number of practical situations, see, e.g.,[9]. The idea 
behind these methods is to create a graph of randomly generated collision-free 
configurations with connections between these nodes made by a simple and fast 
local planning method. These methods run quickly are easy to implement; unfor- 
tunately there are simple situations in which they perform poorly, in particular 
situations in which paths are required to pass through narrow passages in con- 
figuration space [11]. 



C. Lemaitre, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 514—523, 2004. 
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The geometry in workspace into motion planning problems has been used to 
propose several heuristic based in medial axis or generalized Voronoi diagrams, 
see [4, 10]. In particular, in two dimensions the medial axis is a one dimensional 
graph- like structure which can be used as a roadmap. However, the medial axis is 
difficult and expensive to compute explicitly, particularly in higher dimensions. 

1.1 Our Results 

We propose a new algorithm which combines these two approaches by gener- 
ating random networks whose nodes lie on the obstacles surface. Our central 
observation is that it is possible improve the connectivity of configuration space 
using geometric features on the workspace to find free configurations close to 
the obstacles. The main novelty in our approach is a new method for generat- 
ing roadmap candidate points. In particular, we attempt to generate candidate 
points distributed close to each obstacle on work-space taking advantage on their 
geometric features. Using this approach, high quality roadmaps can be obtained 
even when work-space is crowded. Experimental results with “free flying objects” 
with six degrees of freedom (dof) will be shown. 

Previous results using the new proposal have been presented in [5] , and in this 
paper we have improve the heuristic adding a new feature, called “normal vector”. 

The approach extends fairly easily to dynamic environments. Our approach 
can be applied to some important situations that have so far not been satis- 
factorily solved by heuristic methods (Paths through long, narrow passages in 
crowded Work-space can be found). 

The previous approaches related to ours are the path planning methods of 
Kavraki and Latombe [8], Overmars and Svestka [12,13] mentioned above. In 
fact, in [13] the authors describe a technique they call geometric node adding 
in which roadmap nodes are generated from robot configurations near obstacle 
boundaries, which is very similar to the idea of generating nodes on C-obstacle 
boundaries [3]. 

We describe how the roadmap is constructed in Section 2, and how it is used 
for planning in Section 3. Implementation details and experimental results are 
presented in Section 4 and 5, respectively. 

1.2 Probabilistic Roadmap Methods 

Probabilistic roadmap methods generally operate as follows, see, e.g.,[9]. During 
a preprocessing phase, a set of configurations in the free space is generated by 
sampling configurations at random and removing those that put the workpiece in 
collision with an obstacle. These nodes are then connected into a roadmap graph 
by inserting edges between configurations if they can be connected by a simple 
and fast local planning method, e.g., a straight line planner. This roadmap can 
then be queried by connecting given start and goal configurations to nodes in 
the roadmap (again using the local planner) and then searching for a path in the 
roadmap connecting these nodes. Various sampling schemes and local planners 
have been used, see [2, 7, 13]. The algorithms are easy to implement, run quickly, 
and are applicable to a wide variety of robots. 
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The main shortcoming of these methods is their poor performance on prob- 
lems requiring paths that pass through narrow passages in the free space ( Cf ree ). 
This is a direct consequence of how the nodes are sampled. For example, using 
the usual uniform sampling over C — space, any corridor of sufficiently small vol- 
ume is unlikely to contain any sampled nodes whatsoever. Some effort has been 
made to modify sampling to increase the number of nodes sampled in narrow 
corridors. Intuitively, such narrow corridors may be characterized by their large 
surface area to volume ratio: the method in [2] and [6] have exploited this idea. 

2 Roadmap Construction 

The task required for building the roadmap are generating the roadmap candi- 
date nodes and connecting the candidates to form the roadmap. The description 
of these tasks below is for a free-flying rigid body in three dimensions. 




Mass Center 



Fig. 1. This figure presents the geo- 
metric center, in the first sample the ge- 
ometric center is placed into the body, 
in the second one this metric is com- 
puted out of the mesh 




Fig. 2. a)The rotation axis is defined 
in the same direction of the straight- 
ness feature, b) The volume feature is 
proportional to the sphere volume 



2.1 Algorithm Description 

The description of the algorithm is divided into four parts: geometric features 
to be used, first approximation of the configuration space, improving the con- 
nectivity and planning. The following sections describe how the algorithm works 
and we give some details about its implementation. Some figures are included to 
show the main idea behind this new proposal. 

2.2 Geometric Features 

The workspace is read from different files, these files contain the triangle meshes 
which represent the geometry of each body into the workspace. Using such rep- 
resentation, the algorithm compute several parameters which will be used by the 
heuristic and are defined as following: 
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Mass Center. This metric is calculated as the average of the x 3 , y s , and z s 
values of the vertex for a given body into the environment. This parameter 
might not be placed into the object. This property can be see in figure 1. 

The Body Radius. This parameter is computed using the distance between the 
center mass and the farest vertex into the object. This metric is used to calculate 
the sphere which the object will be surrounded. The figure 1. shows the position 
where the mass center is placed and the radius computed using this metric. 

Straightness. Let qi be the vector which define the direction of the “ straight- 
ness ” feature for each V t f?, and qri will define the same feature on the robot. 
This feature indicates the direction which the body presents its long side. The 
figure 2 a), shows how this feature can be see for a given object. 

Volume. Let volt be the “volume ” of the obstacle V) with respect the volume of 
the sphere used to surround it. Both, the “ straightness ” and “volume” features 
are used to improve the connectivity of the roadmap and the figure 2 b). shows 
a geometric representation of this parameter. 

Normal Vector. The normal vector, often simply called the “normal,” to a 
surface is a vector perpendicular to it. Often, the normal unit vector is desired, 
which is sometimes known as the “unit normal.” 

2.3 First Sampling of C-Space 

During this stage, the algorithm uses spheres to surround the robot and the 
obstacles. The figure 4. shows the view of the first sampling. Using spheres only 
during this stage we have two advantages, first the robot has the characteristic to 
rotate in any direction, which will be used for the local planner in the improving 
stage, and second, the cost of collision detection is reduced, because the routine is 
limited to detect when two spheres are in collision (the sphere associated to each 
obstacle and the other one associated to the robot). In this process just a small 
number of configurations will be computed. The main idea in this stage is to place 
configurations in open space and to spend few time. Perhaps the technique lose 
some free configurations but this proposal is focus to compute difficult configu- 
rations, therefore these lose configuration can be calculated in the second stage. 

2.4 Improving the Connectivity of C-Space 

If the number of nodes computed during the first approximation of the roadmap 
is large enough, the set N gives a fairly uniform covering of Cf ree . In easy 
scenes R is well connected. But in more constrained ones where Cf ree is actually 
connected, R often consists of a few large components and several small ones. It 
therefore does not effectively capture the connectivity of Cf ree . 

The purpose of the expansion is to add more nodes in a way that will facili- 
tate the formation of the large components comprising as many of the nodes as 
possible and will also help cover the more difficult narrow parts of Cf ree . 

In this phase, we generate a set N of candidate roadmap nodes, each of which 
corresponds to a point in C-space. The general strategy of the node generation 
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Fig. 3. This figure shows the geometric 
representation of the normal vector 



Fig. 4. First sample of configurations 
space using spheres to sorround the ob- 
jects and reduce the cost of collision de- 
tection during this stage 



process is to construct a set iV, of candidate nodes for each object V* such that 
each c(Vi) € Ni lies near to obstacle Vi. The set of roadmap candidate nodes is the 
union of the candidate sets computed for each obstacle Vj € B , i.e., N = UjlVj. 

Node Generation. We now consider how to compute the candidate set for 
each obstacle. During this stage the technique attempt to take advantage of 
some geometric features of the robot and the obstacles to obtain information 
that allows guide the search of useful configurations. 

Paralell and Perpendicular Configurations. The first geometric feature 
used is called “Straightness”. This feature indicates the direction which the ob- 
ject presents its long side and it is given by a vector Uj, which we have used 
as the direction of the rotation axis. The Figure 2 a), shows the way which the 
rotation axis is represented on the robot and the obstacles. We can see that the 
object will sweep a small volumen as result of defining the rotation axis in the 
same direction of the straightness feature. 

The main idea behind the parallel and perpendicular configurations is to 
compute configurations close to the obstacles, and to place the robot in that 
way that, when the algorithm attempts to rotate the robot (searching a free 
configuration) using its rotation axis, the volume swept will be small avoiding 
get in collision. 

Now we are going to describe how to generate m points close to the obsta- 
cle Vi. Using the straightness feature, the next algorithm compute parallel and 
perpendicular configurations . 

1. First, the algorithm search a collision configuration c(V t ) on the V) obsta- 
cle, such c(Vi) is calculated uniformly distributed around the sphere which the 
obstacle is surrounded (a vicinity for each V) is defined) . 

2. Next, the technique attempts to rotate this configuration until it will be 
parallel to the obstacle (that is v r i j| Vi), if the parallel configuration is not 
incollision then it is added to N, else the process called elastic band is applied 
searching to turn it in free configuration, which will lie close to the obstacle V). 
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3. Once a parallel configuration has been processed, the algorithm computes 
the perpendicular configuration (that means that v r i _L v\) taking the c(Vj) 
calculated in step 1. In the same way like in step 2, if the new perpendicular 
configuration is not in collision then it is added to iVj, else the elastic band process 
is applied. The figure 5, shows how the parallel and perpendicular configurations 
can be seen around the V t . 

The second strategy to calculate free configurations close to the obstalces is 
to place the robot parallel to the direction of a normal vector of some triangle 
on the obstacle surface. 

So, the heuristic calculates the geometric center for each triangle in the mesh 
and obtains the normal vector for each triangle (this process is included as a 
preprocessing phase and is computed for each body in the environment). 

The idea behind this process is to compute free configurations close to the 
ostacle surfaces. After the process have calculated parallel and perpendiacular 
configurations, the lreristic attempt to obtain random configuration using as 
direction of rotation axis the direction of the normal vector. 

The Elastic Band Algorithm. This process works as following, first the 
heuristic calculates the distance vector di between the obstacle position and the 
configuration c(V)), and attempt to approach and moving away the robot with 
respect to the obstacle. To compute this operation, the process scale the d t vec- 
tor (using the values computed with respect the “volume” feature) to calculate 
the next position where the c(V)) will be placed. 

The metric used to obtain the distance vector is the Euclidean distance in 
three dimensional space, and the scale factor is computed using the “ volume ” 
feature of the objects. Thus, the scale factor will be able to be initialized since 
a low value (MIN), these values are presented in figure 7., and for each in- 
teraction the vector will increase until reach the maxima value (MAX). This 




Fig. 5. Parallel and perpendicular 
configurations computed close to the 
object 




Fig. 6. Elastic band process, approach- 
ing and moving away the configuration 
from the obstacle using a scalar metric 
on the distance vector d, 
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Volume value and Scale parameter 
for the Elastic Band Algorithm 



Volume value % 


MIN - value lor the scale 
parameter 


MAX ■ Value tor the scare 
parameter 


Vol 0 25 


0.050 


1.0 


Vd > 0 25 S& 






Vol <= 0 50 






Vd > 0.50SS 
Vd * 0.75 


0.25 


1.5 


Vol > D.7S 


oa 


2.D 



Fig. 7. This table presents the values 
for MAX and MIN parameters used 
during the elastic band process 




Fig. 8. This figure presents the final 
connectivity of configuration space (af- 
ter to apply the Elastic Band process) 



metric has an important role, because we are interesting in configurations close 
to the objects, that means that, the mass center of the objects will have to be 
near. 

The elastic band process works with parallel and perpendicular configurations 
which are computed close to the obstacle. While the distance vector is comput- 
ing the next configuration to be tested, the robot is rotated on its rotation axis, 
searching to find a free configuration (taking advantages of the straightness fea- 
ture). The figure 6. shows how the parallel and perpendicular configurations 
are computed near to the obstacle and how the scalar vector is changing, ap- 
proaching and moving the robot away from the obstacle. The configurations 
which are marked with dots are calculated during the process until reach a free 
one. 

Once the improving strategy has been applied, the connectivity of the 
roadmap could be see as in figure 8. the figure presents the configurations calcu- 
lated during the first approximation (which are surrounded by a sphere), and we 
can see how the heuristic is able to place many configurations in narrow regions 
to reach a better connectivity of configuration space. 

2.5 Connecting Roadmap Candidates 

We now consider how to connect the candidate nodes N = UjiVj to create the 
roadmap. The basic idea is to use a simple, fast, local planner to connect pairs 
of roadmap candidate nodes. 

Ideally, the roadmap will include paths through all corridors in C-space. Thus, 
a trade-off exist between the quality of the resulting roadmap and the resources 
(computation and space) one is willing to invest in building it. 

Many different connection strategies could be used in path planning appli- 
cations. In the prevous work [5] we only use the method used in [8] trying to 
connect each node c(V t ) £ N to its k nearest neighbors in N. 
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3 Planning 

Planning is carried out as in any roadmap method: we attempt to connect the 
nodes x\ and X 2 , representing the start and goal configurations, respectively, 
to the same connected component of the roadmap, and then find a path in the 
roadmap between these two connection points. The following approach, proposed 
in [8], is well suited for our roadmap. 

If no connections is made for Xi, then we execute a random walk and try to 
connect the initial or the end node to the roadmap. This can be repeated a few 
times if necessary. If we still can not to connect both nodes to the same connected 
component of the roadmap, then we declare failure. After both connections are 
made, we find a path in the roadmap between the two connection points using 
Dijkstra’s algorithm. Recall that we must regenerate the path between adjacent 
roadmap nodes since they are not stored with the roadmap. 




Fig. 9. The form which the robot is 
presented is more complex 




Fig. 10. This sample showsn two grids 
and the robot has to pass though them 
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Fig. 11. The alpha puzzle problem ver- 
sion 1.2 



Fig. 12. The information in the table is 
the average after ten runnings for each 
sample. The time is showed in minutes 
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4 Experimental Results 

We implemented a path planner for free flying objects with six degrees of freedom 
in a three dimensional workspace. The code was written in CH — I- on PC Intel 
Pentium 4, the CPU was a 2.4 Ghz with 512MB of RAM. 

In the following, we analyze the performance of the method (this performance 
is seen since the capability of the method to solve the problems) on few scenes. 
In all cases we used a free-flying object robot with six dof. The various environ- 
ments, and some representative configurations of the robot, are shown in Figures 
9,10 and 11. The three samples shown are presented as result of the technique 
applied on the problems. We present three problems, they have different difficult 
level. The problems are labeled as Samplel, Sample2 and Sample3. Below we 
discuss the environments in more detail. 

Samplel: This scene is presented with two obstacles and we can see that the 
form of the robot is more complex. There is a narrow corridor which becomes 
difficult to solve, nevertheless, the heuristic is able to find a path which goes 
through the corridor. The figure 9. presents how the robot pass through the 
corridor. 

Sample2: This environment is presented with two grids and the form of the 
robot is like a “L” . There are several narrow passages, and the distance between 
the grids is small, therefore the robot has a reduced area to move and rotate 
between the grids. The figure 10. shows the how the heuristic was able to find a 
path between the start and goal configurations. 

Sample3: This problem is well know as the “alpha puzzle” problem. There exist 
several different versions of this problem [1]. The Figure 11. shows the solution 
for the 1.2 version of the problem. This problem has the difficult of having few 
configurations into the corridor, therefore its solutions is very complicated. Our 
method is able to find a path to solve it. In the figure some configurations into 
the corridor are shown. 

Table in figure 12. shows the results of performance of the Basic PRM al- 
gorithm in comparison with the Geometric Features based PRM algorithm pro- 
posed in this work. We can see that the time used by the new proposal in larger 
that the used by the basic PRM, nevertheless we think that this time is well 
used (because the a solution can be found). 

5 Conclusion 

We have described a new randomized roadmap method for motion planning prob- 
lems. To test the concept, we implemented the method for path planning for “free 
flying object ” in a three-dimensional space. The method was shown to perform well. 
Currently, and we can say that geometric features on the workspace can be used to 
built heuristics to guide the search of free configurations into narrow corridors. We 
keep on working on the free flying objects problems, and we are geometric features 
on the heuristic searching to improve the connectivity of configuration space. 
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Abstract. In this paper a novel sensory-motor neural controller applied to ro- 
botic systems for reaching and tracking targets is proposed. It is based on how 
the human system projects the sensorial stimulus over the motor joints, sending 
motor commands to each articulation and avoiding, in most phases of the 
movement, the feedback of the visual information. In this way, the proposed 
neural architecture autonomously generates a learning cells structure based on 
the adaptive resonance theory, together with a neural mapping of the sensory- 
motor coordinate systems in each cell of the arm workspace. It permits a fast 
open-loop control based on propioceptive information of a robot and a precise 
grasping position in each cell by mapping 3D spatial positions over redundant 
joints. The proposed architecture has been trained, implemented and tested in a 
visuo-motor robotic platform. Robustness, precision and velocity characteristics 
have been validated. 



1 Introduction 

One of the topics in robotics is the problem of solving the inverse kinematics of re- 
dundant visuo-motor systems for reaching applications in real time. Most of the pro- 
posed solutions are based on close-loop control systems. They are highly dependent 
on the vision system and also need to track the entire robot arm end-effector trajec- 
tory. Although these control systems are continuously employed in robotic and good 
results are obtained [1], the sensory-motor coordination human system does not re- 
quire the visual tracking of the joints whose propioceptive information is learning [2] 
during action-perception cycles, mainly during the child phase. 

The main objective of this work is to give a solution for solving the inverse kine- 
matics of robots, without the knowledge of the internal physical properties of the robot 
arm, such as joint lengths and rotation and translation thresholds of each joints. One 
algorithm giving that solution has the advantage of avoiding the continuous calibration 
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of the system and simultaneously to be independent from the considered robotic 
platform. The information propiocetive will need to be learned by mapping the end- 
effector 3D spatial position, given by the vision system, and the joint positions 
configuration, given directly by the motor encoders. 

One of the difficulties in this work was the necessity to have a totally well-mapped 
spatial-motor and motor-spatial information [3], using previous learnt information for 
anticipatory planning an action program. In this way, the actions are produced quickly 
without a close-loop. The final workspace of the robot arm is autonomously divided in 
small structures like learning cells. The proposed model aims at the idea of solving the 
accuracy sensory-motor coordination by means of two neural networks whose inter- 
connection allows the anticipatory behaviour of the model. This interconnection is 
based on the self-organizing Adaptive Resonance Theory (ART algorithm) for dis- 
crete processes [4], and the AVITE model (Adaptive Vector Integration to End Point) 
[5]. The ART algorithm is a self-organizing neural network which has the ability of 
solving the stability-plasticity dilemma for the competitive learning phase. Uses of 
this algorithm to the proposed architecture will permit to carry out the described an- 
ticipatory behaviour. In the other hand the AVITE neural model, based on supervised 
learning, permits to map the spatial-motor positions in each learning cells. As results, 
the proposed work is capable to combine visual, spatial, and motor information for 
reaching objects by using a robot arm, tracking a trajectory in which the close-loop 
control is only carried out in each learning cell of the workspace. The proposed archi- 
tecture has been implemented in an industrial robot arm and capabilities of robustness, 
adaptability, speedy, accuracy have been demonstrated for reaching tasks, including 
perturbations in the objective position. 



2 Neural Model Structure: Self-Organizing and Fast Mapping 

The proposed architecture is based on two interconnected neural models that sequen- 
tially project the 3D final position (sensorial information) of the object to be grasped 
over the joint positions (spatial information) of the robot arm end-effector. This task is 
made in a predictive way by means of adaptive distribution of the workspace. The 
base of the control scheme is to generate random movements of the robot arm, whose 
end-effector position is detected and computed by a vision system and then the robot 
arm 3D workspace is divided into small cells in whose centres the precise position of 
the robot joints are well known, by means of the propioceptive information and one 
previous learning phase. It produces, in the operation phase, an anticipatory movement 
of the robot toward the centre of the cell in which the target is located. The supervised 
neural model based on the ART algorithm includes a vigilance parameter controlling 
the competitive learning and the final position and dimension of each cell. By means 
of a second learning phase, one neural weight map is obtained for each cell. Due to the 
lineal nature of the AVITE model, the spatial-motor projection is quickly computed 
and few steps close-loop control are required for accurate reaching tasks. The AVITE 
model projects the difference vector (DVj, in visual coordinates between the current 
and desired position of the end-effector, over the incremental angular positions of the 
robot arm. Thus, the 3D visual distance inside the winner cell is reduced with high 
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precision and fast operation. The general performance of the proposed model is repre- 
sented in the scheme of the Fig. 1. 
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Fig. 1. General scheme of the neurocontroller. It is formed by two interconnected neural 
models to map the 3D workspace (non-supervised model) and to compensate the spatial 
error between the current and desired final spatial position (supervised AVITE model) 

In this neural model, the vision system of the stereohead detects the position of 
the object to be grasped. The internal representation of that position will be the 
input to the cell selector module. By means of a competitive algorithm, this module 
calculates the cell in whose workspace is located the target. The projection of the 
visual position of the centre of the cell over the arm joint positions is achieved by 
the cell-centre visual projection module. Once the AVITE model has been exe- 
cuted, the difference between the centre of the cell and the desired position, in vis- 
ual coordinates {DV), is estimated by means of the distance estimator module. 
Then, the DV compensation module reduces that distance by means of few robot 
arm movements and lineal projections. Finally, the produced error is used to update 
the neuron weights of the AVITE model. It will permit to detect if an unexpected 
situation happens or if a mechanical blocking in some joints of the robot arm is 
produced. In order to validate the behaviour of the proposed architecture operating 
in dynamic environments, perturbations to the target position have been considered 
and the performance of cells commutation when tracking moving objects is tested. 
The obtained results emphasize the emulations of human biological behaviour for 
the proposed architecture. In a human system the majority of the time for reaching 
objects is dedicated to the movement compensation due to possible perturbations in 
the measurements of both sensors: visual and propioceptive. The associative maps 
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which are generated by the AVITE algorithm, permits to learn the gesture of the ro- 
bots, including the mechanical faults of the robot system. 



3 Non-supervised Adaptive Generation of Learning Cells 

The non-supervised neural model implemented in the proposed architecture is based 
on the ART2 model developed by Carpenter, et al. [4], It is focused to the 
workspace division in spatial coordinates and to supply the anticipatory behaviour 
to the neural architecture. Each 3D region will be different and characterized by the 
position of its centroids and the Voronoi frontiers, implying the configuration of the 
final dimension of each cell. How the cellular structure is defined in the spatial 
frame, it will determine the number of steps and the precision or final error of the 
neural model for reaching tasks. The structure of the ART model allows to control 
the final cell configuration by means of one vigilance parameter and the learning 
trial number. The structure of the ART neural network is represented in Fig. 2. 




Input Vector 



Fig. 2. ART structure for adaptive generation of learning cells. The input layer Fj has the 
same dimension like the input vector; The neurons (centroids) of the output layer F 2 are the 
patterns to be classified; W,, are the feed-forward connection weights; Vy are the feedback 
connection weights; The Gain Control is used to get network stability by inhibition of acti- 
vation control of the Fj and F 2 layers; The Reset Signal is used to control the membership 
level of a pattern to the winner neuron in F 2 layer and, finally, p is the vigilance parameter 

In the learning phase, initially one random posture for the robot arm is generated 
and both end-effector spatial positions and target, referred to the robot arm coordinate 
frame, are computed by the vision system. Taking D the number of d.o.f. of the robot. 
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each position is represented by the 0 V vector ( lxD dimension) while its corresponding 
end-effector position is represented by P xyz vector ( 1x3 dimension). A p parameter 
permits to control the adaptive generation in every k learning step. In each trial, the 
P xyz vector is the input to the network. The winner centroid wij * is selected by the 
nearest to end-effector position in terms of Euclidean distance. Then, the value of each 
weight associated to these centroids will be updated by means of equation (1), starting 
from random initial values: 



WjKk + 1) = Vij(k) 



ei(k) + Wji(k) x Nj(k) 
Nj(k) + l 



( 1 ) 



where erfk) is the i th component of the input vector P^; Nj represents the times that j lh 
neuron of F 2 layer has been winner. 

The process will be repeated until the convergence of the neuron weights of the 
ART map is reached. The p parameter will be compared, in each iteration, with the 
Euclidean distance between the new patron position and the winner cell centroid. This 
comparison will determine the generation of new cells or the updating of computed 
centroids. In the operation phase, when a target position is detected, the network se- 
lects the winner cell. Then, it will project that sensorial over the spatial position of the 
robot arm, by means of the learnt propioceptive information. 

The next step will be to compensate the DV between the calculated current posi- 
tion of the robot arm and the desired position in sensorial coordinates. The cell gen- 
eration permits to know the most favourable posture of the robot arm whose end- 
effector position is the nearest to the target. By adding the ART algorithm to the neu- 
ral structure is possible its implementation in any robotic platform with independence 
of their internal dynamic models. 



4 Neural Associative Maps for Sensory-Motor Transformation 

The second neural model is dedicated to compensate the error in each cell. Every 
cell has an independent behaviour for the others, that is, if one cell is excited the 
others are inhibited. All the cells implement the spatial-rotation transformation. In 
order to control the robot arm, the neurocontroller must obtain the propioceptive 
data from the joints and visual information also according to the AVITE learning 
model from which is inspired. Fig. 3. shows the scheme of the learning system, 
where TPVs is the desired spatial position of the arm; PPVs is the spatial position of 
the cell centre; PPVm is the angular position of robot arm joints; DVs is the 
difference between TPVs and PPVs; and DVm is the result of the transformation 
between spatial and rotation increments. 

When a cell is excited, the centre of the cell applies its content into PPVm and 
PPVs vector. The DVs vector calculates the difference between the centre of the 
cell and the desired position. The DVs is transformed into the DVm through a set of 
neurons. The resulting increments are modulated by a Go(k) signal and the results 
integrated into the PPVm. 
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Fig. 3. Structure of the sensory-motor transformation in each learning cell, based on the AVITE 
fast mapping in spatial (visual) coordinates. The centre of the cell stores the spatial coordinates 
and the motor coordinates in that point 



The learning phase is based in the knowledge acquired in action-reaction cycles. 
During this phase, random increments are introduced in the DVm vector, the system 
produces these movements and its spatial effect is taken over the DVs vector. In this 
way, the neuron weights, given by W matrix, are updated by means of the Gradient 
Descent optimization algorithm. The compensation of the position error produced by 
the DV will be made by the expression (2): 



AO = IV • AS 



( 2 ) 



where AO vector computes the incremental values to be added to the current position 
of the robot arm in spatial coordinates, and AS stores the DV in visual coordinates. 
Each cell generates a neuron weight matrix with a dimension equal to the size of sen- 
sorial coordinates (x, y, z) multiplied by the size of spatial coordinates (number of 
degrees of freedom of the robot arm). Thus, the dimension of W matrix will be 3xD 
for each N" cell, being N, the number of the learning cells, and D the robot arm d.o.f. 
The linearity of the equation (4) has the advantage of the easy implementation over a 
hardware device like DSP or FPGA and the fast computation of the spatial projection 
over the motor commands. 



5 Results 

The implementation of the proposed system has been carried out in a real robotic 
installation, as Fig. 4a shown, formed by an industrial robot arm and the LINCE 
anthropomorphic stereohead with two colour cameras to simultaneously detect the 
objective (a small red sphere) and the end-effector robot arm (green label over the 
gripper). The implementation of the proposed neural architecture has been focused 
on robotic applications for reaching and tracking targets. The base, elbow and 
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shoulder joints have been considered for the experimentation. Firstly, the generation 
of cells inside the robot workspace has been carried out based on the described ART 
neural algorithm. Fig. 4b shows a graphical representation of the results for 600 trials 
and p=0.18 over the simulation software developed by DISA, Spain [6], 




a) b) 

Fig. 4. Robotic installation, (a) Visuo-motor robotic system formed by LINCE stereohead and 
one ABB- 1400 robot arm. (b) Simulation of the centroid distribution by the ART algorithm is 
shown. For p = 0.18, 24 centroids have been autonomously generated 

To test the proposed model for reaching applications without perturbations several 
experiments have been carried out. In Table 1, the most relevant results are shown. In 
it, the behaviour of the model is compared with the number of generated cells and the 
final error reached, which is given by (3): 



E(k) = 




(mm.) 



( 3 ) 



Table 1. Experimental scenarios for reaching tasks. Different p parameters, target positions 
and desired errors have been considered. In all cases Go=l; The end-effector initial position 
was {900,100,400} and the target positions were Tl = { 100;900;1000} and T2={300;- 
900,800}. The influence of the p parameter in the final precision and in the time-to-reach 
can be observed 
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The obtained results for reaching tasks have been compared with other close-loop 
neural architectures in the same platform [7], In this case, times to reach the object are 
reduced about 60%. 




Fig. 6. Evolution of the robot arm end-effector to reach the object with perturbations 





Fig. 7. Evolution of (a) the error and (b) the robot arm joint positions. Because the proposed 
architecture allows to control the cell commutation when perturbations happen, the error is 
quickly decreased by means of the open-loop positioning in the centre of every cell, the specific 
neural weight matrix for each cell and the lineal characteristics of the AVITE model 

To test the behaviour of this architecture when unexpected variations in the target 
position are produced, experiments with instantaneous displacements of the object 
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have been carried out from Pi={ 1000;500;900} to P 2 ={700;-900;400}. The results are 
shown in Fig. 6 and 7. In this case, the object position varies from cell N°3 to N°35. 
Thus, the cell commutation procedure is achieved and the movement compensation 
inside the second cell is computed by means of the learnt inverse Jacobian matrix 
which was learnt for that cell. 




Fig. 8. Evolution of the 3D end-effector position for tracking an object which is moving with 
constant velocity of 7,6 cm/sec. Five changes of cells are produced and the proposed architec- 
ture, quickly compute the next position by means of the associated weight matrix 




Fig. 9. Evolution of (a) the error in spatial coordinates and (b) the 3D trajectory of the robot 
arm end-effector and the moving object 
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Finally, to test the capabilities of the proposed architecture for tracking tasks, con- 
stant movements of the object have been generated and the algorithm has been exe- 
cuted. Normally in this case, several commutations of cells are produced and small 
movement compensations are generated inside each cell of the 3D spatial trajectory. 
An appropriated filtering in the space of the joints allows to smooth abrupt variations 
of the end-effector position. Fig. 8 and 9 show the obtained results for tracking tasks. 



6 Conclusions 

In this paper a neural architecture based on human biological behaviour has been 
presented and the obtained results have been analysed for robotic reaching and track- 
ing applications with a head-arm system. The 3D spatial division of the robot arm 
workspace in learning cells is proposed and is solved by means of a self-organizing 
neural algorithm based on the ART2 model. Indeed, in this process the propioceptive 
information is learnt. The produced error by the discrepancy between each cell-centre 
and the target position is compensated by means of an AVITE ( Vector Associative 
Map) adaptive architecture. It projects the difference vector of visual position over 
incremental joint positions of the robot arm. The obtained results over a robotic plat- 
form have demonstrated that final error in reaching applications can be very low, 
taking into account the robustness and fast operation of the model. 
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Abstract. This paper describes the development of local vision-based 
behaviors for the robotic soccer domain. The behaviors, which include 
finding ball, approaching ball, finding goal, approaching goal, shooting 
and avoiding, have been designed and implemented using a hierarchical 
control system. The avoiding behavior was learned using the C4.5 rule 
induction algorithm, the rest of the behaviors were programmed by hand. 
The object detection system is able to detect the objects of interest at a 
frame rate of 17 images per second. We compare three pixel classification 
techniques; one technique is based on color thresholds, another is based 
on logical AND operations and the last one is based on the artificial 
life paradigm. Experimental results obtained with a Pioneer 2-DX robot 
equipped with a single camera, playing on an enclosed soccer field with 
forward role indicate that the robot operates successfully, scoring goals 
in 90% of the trials. 



1 Introduction 

Robotic soccer is a common task for artificial intelligence and robotics research 
[1]; this task permits the evaluation of various theories, the design of algorithms 
and agent architectures. This paper focuses on the design and evaluation of 
perceptual and behavioral control methods for the robotic soccer domain; these 
methods are based on local perception, because it permits designers to program 
robust and reliable robotic soccer players that are able to cope with highly 
dynamic environments such as RoboCup environments. 

Vision is the primary sense used by robots in RoboCup. We used a local vision 
approach with an off-board computer. In this approach, the robot is equipped 
with a camera and an off-board image processing system determines the com- 
mands for the robot. We used this approach because of the advantages that 
it offers, which include lower power consumption, faster processing and the fact 
that inexpensive desktop computers can be used instead of specialized vision pro- 
cessing boards. We compare three strategies for pixel classification. One strategy 
is based on color thresholds [8], another is based on the algorithm of Bruce et 
al. [6] and the last one is based on the artificial life paradigm. 
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Behaviors were designed and implemented using a hierarchical control system 
with a memory module for a reactive robotic soccer player [5]. The behaviors, 
which include finding ball , approaching ball, finding goal, approaching goal, and 
shooting, were programmed by hand. The avoiding behavior was learned via 
direct interaction with the environment with the help of a human operator using 
the C4.5 rule induction algorithm [9]. 

The paper is organized as follows. Section 2 reviews related work. Section 3 
describes the methodological approach used in the design of our robotic soccer 
player. Section 4 sumarizes the experimental results obtained. Finally, Section 5 
discusses conclusions and perspectives. 

2 Related Work 

2.1 Vision 

The cognachrome vision system©, manufactured by Newton Research Labs, is 
a commercial hardware-based vision system used by several robot soccer teams 
[13]. Since it is hardware-based, it is faster than software running on a general- 
purpose processor. Its disadvantages are its high cost and the fact that it only 
recognizes three different colors. 

A number of past RoboCup teams have used alternative color spaces such as 
HSB or HSV for color discrimination proposed by Asada [2] , since these separate 
color from brightness reducing sensitivity to light variations. 

Several RoboCup soccer teams have adopted the use of omnidirectional vision 
generated by the use of a convex mirror [3] . This type of vision has the advantage 
of providing a panoramic view of the field, sacrificing image resolution. Moreover, 
the profiles of the mirrors are designed for a specific task. 

The fast and cheap color image segmentation for interactive robots employs 
region segmentation by color classes [6]. This system has the advantage of being 
able to classify more than 32 colors using only two logical AND operations and 
it uses alternative color spaces. 

For our vision system, we used the pixel classification technique proposed 
by Bruce [6] and a variant of the color spaces proposed by Asada [2] (see 
Section 3.2). 

2.2 Control 

Takahashi et al. [12] used multi-layered reinforcement learning, decomposing a 
large state space at the bottom level into several subspaces and merging those 
subspaces at the higher level. Each module has its own goal state, and it learns 
to reach the goal maximizing the sum of discounted reward received over time. 

Steinbauer et al. [11] used an abstract layer within their control architecture 
to provide the integration of domain knowledge such as rules, long term planning 
and strategic decisions. The origin of action planning was a knowledge base 
that contained explicit domain knowledge used by a planning module to find a 
sequence of actions that achieves a given goal. 
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Bonarini et al. [4] developed a behavior management system for fuzzy behav- 
ior coordination. Goal-specific strategies are reached by means of conflict reso- 
lution among multiple objectives. Behaviors can obtain control over the robot 
according to fuzzy activation conditions and motivations that reflect the robot’s 
goals and situation. 

Gomez et al. [7] used an architecture called dynamic schema hierarchies. In 
this architecture, the control and the perception are distributed on a schema 
collection structured in a hierarchy. Perceptual schemas produce information 
that can be read by motor schemas to generate their outputs. 

We used a behavior-based control system or subsumption architecture with 
a memory module in order to control our robotic soccer player (see Section 
3.3). 



3 The System 

3.1 Hardware and Settings 

The robot used in this research is a Pioneer 2-DX mobile robot made by Activ- 
Media©, equipped with a Pioneer PTZ camera, a manually-adapted fixed grip- 
per and a radio modem. The dimensions of the robot are 44 cm long, 38 cm wide 
and 34 cm tall, including the video-camera. The robot is remotely controlled by a 
AMD Athlon 1900 computer with 512 MB of RAM. Figure 1(a) shows a picture 
of our robotic soccer player. 

The environment for the robot is an enclosed playing field with a size of 
180 cm in length and 120 cm in width. There was only one goal, painted cyan, 
centered in one end of the field with a size of 60 cm wide and 50 cm tall. The 
walls were marked with an auxiliary purple line whose height is 20 cm from the 
floor. Figure 1(b) shows a picture of the playing field. 
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Fig. 1. The robotic soccer player (a). The soccer playing field (b) 
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3.2 Vision 

A robust, fast and fault tolerant vision system is fundamental for the robot, since 
it is the only source of information about the state of the environment. Since 
all objects of interest in the environment are colored, we believe that vision is 
the most appropriate sensor for a robot that has to play soccer. We present 
below the object detection system used by the robot and a strategy for pixel 
classification based on the artificial life paradigm. 

Object Detection. The vision system processes images captured by the robot’s 
camera and reports the locations of various objects of interest relative to the 
robot’s current location. The objects of interest are the orange ball, the cyan 
goal and the auxiliary purple line on the field’s wall. The steps of our object 
detection method are: 

1. Image Capture: Images are captured in RGB in a 160 x 120 resolution. 

2. Image Resizing: The images are resized to 80 x 60 pixels. 

3. Color Space Transformation: The RGB images are transformed into the 
HUV color space, for reducing sensitivity to light variations. 

4. Pixel Classification: Each pixel is classified by predetermined color thresh- 
olds in RGB and HUV color spaces. There are three color classes: the col- 
ors of the ball, the goal, and the auxiliary line. The pixel classification is 
based on [6], in order to use only two logical AND operations for each color 
space. 

5. Region Segmentation: Pixels of each color class are grouped together into 
connected regions. 

6. Object Filtering: False positives are filtered out via region size. 

Figure 2(a) shows an image captured by the frame grabber and Figure 2(b) 
shows the robot’s perception. 




(a) 



(b) 



Fig. 2. Image captured by the camera (a). The robot’s perception (b) 
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Artificial Life Approach for Pixel Classification. In order to reduce the 
time invested in pixel classification, the most expensive step in object detection, 
we tested an artificial life-based method. Ideas of distributed computing were 
taken from Reynolds’s boids [10], where a group of agents moves as a flock of 
birds or a school of fish. For this strategy, we used 2500 agents, each having an 
internal state to indicate whether it is over an object of interest or not. Agents 
were able to detect three color classes: the colors of the ball, the goal and the 
auxiliary line in the walls. Agents were serialized by an agent manager which 
assigned movement turns and prevented collisions between agents. However, the 
recognition task is distributed among agents. The agents can move in their world, 
which is the image perceived by the camera. Only one agent can be located over 
each pixel. Agents can sense the color intensity values in the image in order to 
perform pixel classification. The locomotion of an agent consists of moving pixel 
by pixel via its actuators. Figure 3 shows a snapshot of the pixel classification 
method based on Artificial life. 




Fig. 3. Artificial life-based pixel classification 



3.3 Control 

Behaviors were designed and implemented using a subsumption architecture [5] 
because this architecture offers the necessary reactivity for dynamic environ- 
ments. We incorporated a new element to this architecture, a memory module. 
This module acts as a short-term memory that enables the robot to remember 
past events that can be useful for future decisions. The memory module affects 
directly the behaviors programmed into the robot. 

The avoiding behavior is a horizontal behavior in the architecture that over- 
writes the output of the rest of the behaviors in our vertical subsumption ar- 
chitecture. The architecture was implemented using four threads in C++, one 
for the vertical behaviors module, one for the memory module, one for control- 
ling the robot movements and one for the horizontal behavior to avoid collisions 
with the walls. In this architecture, each behavior has its own perceptual input, 
which is responsible of sensing the objects of interest. Each behavior writes its 
movement commands to shared memory to be executed. The architecture used 
for the robot’s control system is shown in Figure 4. 
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Fig. 4. The architecture of the system 



3.4 Description of Modules and Behaviors 

1. Memory: This is an essential module for the achievement of the robot’s 
global behavior. Memory, like behaviors, has its own perceptual input to 
sense the ball and the goal. The function of this memory is to remember 
the last direction in which the ball or the goal were perceived with respect 
to the point of view of the robot. The memory module affects directly the 
other behaviors because it writes the directions of the ball and the goal 
on a shared memory used in the behaviors’s execution. There are six pos- 
sible directions that the memory has to remember: ball to the left, ball 
to the right, centered ball, goal to the left, goal to the right and centered 
goal. 

2. Finding Ball: The robot executes a turn around its rotational axis until the 
ball is perceived. The robot turns in the direction in which the ball was last 
perceived. If this information was not registered then the robot executes a 
random turn towards the left or right. 

3. Approaching Ball: The robot centers and approaches the ball until the ball 
is at an approximate distance of 1 cm. 

4. Finding Goal: The robot executes a turn around its rotational axis until the 
goal is perceived. The robot turns in the direction in which the goal was last 
perceived. If this information was not registered then the robot executes a 
random turn towards the left or right. 

5. Approaching Goal: The robot executes a turn in the direction of the center 
of the goal until the goal is centered with respect to the point of view of the 
robot. 

6. Shooting: The robot makes an abrupt increase of its velocity to shot the ball 
towards the goal. There are two possible kind of shots, a short shot when 
the robot is close to the goal (a distance equal or less than 65 cm) and a 
long shot, when the robot is far from the goal (more than 65 cm). 

7. Avoiding: The robot avoids crashing against the walls that surround the soc- 
cer field. Determining manually the necessary conditions in which the robot 
collides with the wall is difficult because the wall can be perceived in many 
forms, therefore we used the machine learning algorithm C4.5 [9] to learn 
whether a collision must be avoided or not. 








Development of Local Perception-Based Behaviors 541 



4 Experimental Results 

4.1 Pixel Classification Results 

We present the results obtained by three implementations of pixel classifica- 
tion. The first implementation was based on color thresholds [8], the second 
implementation was based on the algorithm proposed by Bruce et al. for pixel 
classification [6] , and finally, the third implementation was based on the artificial 
life paradigm. 



Table 1. Pixel classification results 



Method 


Images per second 


Processing average time 


Color thresholds 
Bruce-based method 
Artificial life-based method 


12 images 
18 images 
14 images 


0.0874 sec. 
0.0553 sec. 
0.0707 sec. 



Results of pixel classification are shown in Table 1. As this table indicates, 
the worst strategy for pixel classification task was based on color thresholds [8] . 
The best strategy for this task was based on the algorithm proposed by Bruce 
et al. [6], this strategy was implemented as a step in the object detection system 
for the robotic soccer player. We expected a better performance from the pixel 
classification method based on artificial life, because this method needs to exam- 
ine only 2500 pixels, corresponding to the total number of agents, instead of the 
total number of pixels in the image (8600 pixels). However, in this strategy each 
of the agents spends time calculating its next movement, producing a general 
medium performance. 

4.2 Avoiding Behavior Results 

For the avoiding behavior, we collected a training set of 446 instances of col- 
lisions. There were 153 positive samples where there was a collision and 293 
negative samples where there was not collision. The elements of the input vec- 
tor were roundness, compactness, convexity, orientation, contour length, mean 
length of runs, line index of lower right corner point, column index of lower right 
corner point, row of the largest inner circle and column of the largest inner circle 
of the ball’s region detected by the object detection system. The experiments 
were validated using 10-fold cross-validation. We tested 5 machine learning algo- 
rithms for the classification task; the results obtained are summarized in Table 
2. As this table shows, the C4.5 algorithm obtained the best percentage of cor- 
rectly classified instances for the collision avoidance task. The rules generated 
by C4.5 algorithm were implemented in our avoiding behavior. 

4.3 Global Performance 

Our robotic soccer player has a forward role, thus its main task is to score goals 
in a minimum amount of time. In order to test the global performance of our 
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Table 2. Percentage of correctly classified instances by machine learning algorithm for 
the avoiding behavior 



Machine learning algorithm 


% of correctly classified instances 


Support Vector Machines 
Artificial Neural Networks 
C4.5 

Naive Bayes 
Conjuntive Rules 


91.25 ± 0.0603 % 
90.40 ± 0.0849 % 
92.20 ± 0.0638 % 
87.62 ± 0.0683 % 
90.68 ± 0.0222 % 



robotic soccer player, we designed a set of experiments. The experiments were 
performed on the soccer field shown in Figure 1(b). The robot position, robot 
orientation and ball position were selected 20 times randomly as follows: 

1. For selecting the robot’s position, the field was divided into 24 cells of equal 
size. Figure 5(a) shows the cells for the robot position. 

2. For selecting the ball’s position, the field was divided into 9 cells of equal 
size. Figure 5(b) shows the cells for the ball position. 

3. For selecting the robot’s orientation, there were 4 directions to the robot. 
The orientation where the goal is: 1) in front of the robot, 2) left to the 
robot, 3) back to the robot and 4) right to the robot. Figure 5(c) shows the 
possible orientations for the robot. 




| goal 1 

y 

3 



(a) (b) (c) 



Fig. 5. Experiments’s configuration. Robot position (a). Ball position (b). Robot ori- 
entation (c) 



An experiment’s configuration can be represented as a triplet, of the form (ball 
position, robot position, robot orientation). The configuration for the 20 exper- 
iments performed were: (24,7,1), (24,8,1), (21,2,2), (8,8,4), (18,7,3), (22,9,2), 
(24,4,4), (7,4,3), (6,4,3), (8,2,2), (15,1,3), (21,4,1), (12,2,2), (11,9,1), (7,8,4), 
(20,9,1), (7,9,4), (H,9,4), (10,5,2) and (6,2,3). Table 3 summarizes the time 
spent in seconds by each behavior performed by the robot in the experiments. 
The total time spent by the robot in the experiments was 632 seconds. 

The percentage of time used by behaviors in the experiments was 28% for 
Finding Ball, 32.27% for Approaching Ball, 14.24% for Finding Goal, 9.49% for 
Approaching Goal and 16% for Shooting. The average time required by the robot 
to score a goal is 35.11 seconds. 
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Table 3. Time spent, in seconds, in each of the behaviors executed by the robot during 
20 experiments. The symbol indicates that a behavior in a given experiment was not 
executed. An experiment that contains only symbols, is an unsuccessful experiment 
in which the robot got stuck against the walls 



Experiment 


Finding 


Approaching 


Finding 


Approaching 






Number 


Ball 


Ball 


Goal 


Goal 


Shooting 


Duration 


1 


16 


10 


10 


- 


6 


42 


2 

q 


11 


19 


17 


1 


6 


54 


o 

4 


13 


9 


3 


4 


5 


34 


5 


8 


5 


- 


2 


6 


21 


6 


8 


11 


7 


10 


6 


42 


7 


6 


31 


6 


3 


6 


52 


8 


5 


9 


7 


1 


5 


27 


9 


5 


6 


- 


5 


5 


21 


10 


8 


6 


- 


4 


5 


23 


11 


2 


16 


8 


- 


6 


32 


12 

13 

14 


17 


20 


11 


6 


6 


60 


— 


7 


— 


3 


6 


16 


15 


16 


11 


- 


7 


5 


39 


16 


- 


5 


- 


- 


6 


11 


17 


27 


8 


- 


4 


6 


45 


18 


19 


8 


- 


3 


6 


36 


19 


6 


12 


11 


2 


5 


36 


20 


10 


11 


10 


5 


5 


41 


Totals 


177 


204 


90 


60 


101 


632 sec 



The avoiding behavior was successful, the robot avoided 10 of 12 avoidance 
situations obtaining 83% success. 

An useful functionality of the soccer player emerges from the interaction 
of three behaviors: approaching ball , finding goal and avoiding. This emergent 
behavior consist of regaining the ball from the corner. In the experiments, the 
robot was able to regain the ball from the corner four out of five times obtaining 
80% success. In the 20 experiments executed, the robot was able to score 18 
goals obtaining 90% success. 

5 Conclusions 

In this paper, we presented our research on the development of local perception- 
based behaviors for a Pioneer 2-DX robot equipped with a single camera. 

The subsumption architecture used for the robot control gives the necessary 
reactivity to play soccer, also the memory that we incorporated enables the robot 
to base its decisions on past events. 
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The avoidance behavior was much easier to learn than to program by hand. 
Building the avoiding behavior using the C4.5 algorithm to learn to avoid colli- 
sions with the walls was successful. 

Although the strategy for pixel classification based on artificial life did not 
improve the performance, it seems to be a promising strategy to create a com- 
pletely distributed control system for a robotic soccer player. The main limi- 
tation of this approach is the current computational processing power needed 
to support a large number of agents with complex behaviors. Using our object 
detection method we can detect the ball, goal and auxiliary line, at a frame rate 
of 17 frames per second. 

Experimental results obtained with our robotic soccer player indicate that the 
robot operates successfully showing a high-level intelligent behavior and scoring 
goals in 90% of the trials. 

In future work, we will use other machine learning techniques to help us 
develop behaviors such as approaching ball. The next step to reach in our research 
is to consider multi-robot coordination, an important issue in robot soccer. 
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Abstract. In this paper we tackle the problem of providing a mobile 
robot with the ability to build a map of its environment using data 
gathered during navigation. The data correspond to the locations vis- 
ited by the robot, obtained through a noisy odometer, and the distances 
to obstacles from each location, obtained from a noisy laser sensor. The 
map is represented as an occupancy grid. In this paper, we represent the 
process using a Graphical Representation based on a statistical struc- 
ture resembling a Hidden Markov model. We determine the probability 
distributions involved in this Graphical Representation using a Motion 
Model, a Perception model, and a set of independent Bernoulli random 
variables associated with the cells in the occupancy grid forming the 
map. Our formulation of the problem leads naturally to the estimation 
of the posterior distribution over the space of possible maps given the 
data. We exploit a particular factorization of this distribution that al- 
lows us to implement an Importance Sampling algorithm. We show the 
results obtained by this algorithm when applied to a data set obtained 
by a robot navigating inside an office building type of indoor 
environment. 



1 Introduction 

Robust navigation in natural environments is an essential capability of truly 
autonomous mobile robots. Providing robots with this skill, however, has turn to 
be a difficult problem. This is particularly true, when robots navigate in unknown 
environments where globally accurate positioning systems, such as GPS, are not 
available. 

In general, robots need a map of their surrounding and the ability to locate 
themselves within that map, in order to plan their motion and successfully nav- 
igate afterwards. This is why robot mapping and localization is now considered 
a fundamental component of autonomous mobile robots [18] [10]. 

Some approaches consider the problem of mapping under the assumption 
that the locations visited by the robot are known [20]. This situation is not 
realistic, however, if we consider that sensors of location, such as odometers, 
carry error in the location measurement. On the other hand, some approaches 
consider the problem of localization, under the assumption that a map of the 
environment is available [3] [6] [15]. The actual situation in applications, however, 
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Fig. 1 . Map obtained from Raw Data 

is that neither the map nor the locations are known. This has led to research on 
the simultaneous localization and mapping (SLAM) problem using sensor data 
collected by the robot [13] [21] [19] [10]. 

As today technology, most of the SLAM approaches consider a robot equipped 
with an odometer, that collects information about the robot ego- motion, and 
range sensors, such as sonars or laser range finders, that measure the distances 
to nearest obstacles. As an example, Figure 1 shows a map drawn from raw 
odometer and laser readings collected by a mobile robot. The figure shows how 
odometry error accumulates so that, it seems that the robot has visited two 
different corridors, instead of just one straight hallway, as it did. 

In this paper, we understand SLAM as an estimation problem, where the 
data correspond to odometer and range sensor readings collected by the robot 
during its trajectory. The goal is to estimate the posterior distribution of the 
map and the locations visited by the robot given the data. Odometer readings 
correspond to rotation and translation measures of the robot movements. Range 
readings correspond to distances to the closest obstacles, with respect to the 
robot location. These distances are measured in a set of previously specified 
directions. 

The map is represented by an occupancy grid [5]. In this context, we under- 
stand a map of a given environment as a random matrix, each component of 
which is associated to a spatial location in the environment. The set of asso- 
ciated locations corresponds to a regular grid and each component of the map 
takes either the value 1 or 0 depending on whether the corresponding location 
is occupied or not. We express our knowledge of the map through its posterior 
distribution given the information provided by the robot. 

This paper is organized as follows. Section 2 reviews relevant previous work 
on SLAM. Section 3 discusses the details of our probabilistic approach. Section 4 
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shows the results of applying our methodology to real data collected by a robot 
navigating inside an office building type of indoor environment. Finally, Section 
5 presents the conclusions of this work. 



2 Previous Work 

Although there is an extensive research literature touching on mapping or lo- 
calization for mobile robots 1 , the SLAM problem is a relatively newer research 
area, where most efforts have been made over the last decade. An important 
family of approaches to SLAM is based on versions of the Kalman filter. The pi- 
oneering development in this area is the paper by Smith et al. [16] who basically 
propose the Hidden Marked Model (HMM) approach widely used today. Smith 
et al. presents an application of the Kalman filter to the problem of estimating 
topological maps. They assume a fixed number of landmarks in the environment 
where these landmarks can be identified by their cartesian coordinates. At a fixed 
time instant, the set of landmarks coordinates and the location of the robot are 
assumed to be unobservable or latent variables. As in the Kalman filter, the main 
assumption is that the posterior distributions of all these variables are Gaussian 
and that the observations, given the latent variables, can be described as a linear 
function and a white noise term. 

These two assumptions are somewhat restrictive. The Gaussian assumption 
makes this approach unsuitable for multimodal distributions that arise when 
the location of the robot is ambiguous. The linearity assumption is not met 
in general, since the relation between odometry and locations involves trigono- 
metric functions. The Extended Kalman Filter (EKF) [11] partially handles 
non-linearity using a Taylor approximation. 

For the non-Gaussian case, Thrun et al. [22] postulate a general approach that 
can be used with general distribution functions. Under this approach, however, 
computing maximum likelihood estimates is computationally too expensive. In 
[20], Thrun presents an application of the Expectation and Maximization algo- 
rithm [4] applied to mapping. The map is treated as the parameter to be esti- 
mated while the locations are treated as part of a Hidden Marked Model. Thus, 
Thrun proposes maximizing the expected log likelihood of the observations and 
the locations, given the map. 

A more recent successful approach to solving the SLAM problem is the Fast- 
SLAM algorithm [13]. This approach applies to topological maps, and is based 
on a factorization of the posterior distribution of maps and locations. The models 
that determine the process are the ones used in [20]. 

On the other hand, [9] present an approach that is also based on the de- 
scription in [20], but this one applied to occupancy grids. This approach finds 
locations iteratively over time. At each point in time, the algorithm estimates 
the location visited by the robot as the location that maximizes the probabil- 
ity of the current data, given past data and previous location estimates. Next 



See [20] for a good overview of the literature in this area. 
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step finds the map, as the map that maximizes the posterior probability of the 
estimated locations and the observed data. 

Our mapping approach applies to occupancy grid maps of static environ- 
ments. Our formulation of the problem is based on the approach by Thrun et al. 
[22], We build a Graphical Representation of that formulation where the loca- 
tions are considered unobservable variables determining the observed odometer 
readings and, together with the map, determining the observed laser readings. 
The probability model of the whole process is determined by Motion and Per- 
ception Models and a prior distribution for the map. 

As opposed to previous approaches, our approach propose a fully Bayesian 
approach where our goal is to estimate the posterior distribution of the map using 
simulation. Thus, our approach shows a formal description of the entire process 
and develops a Bayesian solution. The fact that it uses more general Motion 
Model than the one used by the Kalman Filter approaches, makes it applicable 
to a wider set of problems. The advantage of this method is that it does not 
provide a single estimate of the map, as the EM based solution, but it produces 
multiple maps showing the notion of variability from the expected posterior map. 
As for localization, we obtain a simulation of the locations visited by the robot 
from their posterior distribution, as an intermediate step while simulating maps. 



3 Our Approach 

We describe the SLAM problem in probabilistic terms. We assume that there 
are true but unobservable locations visited by the robot and that there are 
true but unobservable distances to obstacles from each of those locations. These 
locations and distances determine the map of the environment, represented as an 
occupancy grid. Odometer and laser readings correspond to the observed values 
of true locations and distances to obstacles, respectively. We assume that the 
observations are centered at their true counterparts, having random variations 
around them 2 . 

Figure 2a) shows a Graphical Representation of the problem where non- 
observable variables have been circled for clarity. Our Graphical Representation 
was developed originally in [1] and there is a similar, but less specific Graphical 
Representation, developed independently in [14]. The Graphical Representation 
we develop here has been subsequently adopted in [8]. 

In Figure 2, Ut represents the difference between odometer readings at times 
t — 1 and t. Z t represents the location of the robot at time t. 9 t represents the 
distances from Z t to the closest obstacles in front of the robot at time t. It is 
important to note that 9 t is fully determined by the location of the robot, Z t , 
and the map, M. Finally, V t represents the laser readings at time t. 

Writing U and V to denote the matrices of odometer reading differences and 
laser readings collected from time 1 to T, the SLAM problem can be expressed 



2 From now on we omit the word “true” when referring to true locations and true 
distances to obstacles. 
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Fig. 2. a) Graphical Representation of the problem, b) Translation and rotation com- 
ponents of movement from Zt - 1 to Zt 



as the problem of determining the posterior distribution of M given U and V, 
P(M\U, V). 

3.1 Motion and Perception Models 

To complete the specification of the process we need to establish the probabilis- 
tic models that drive the dependencies, in particular, the so-called Motion and 
Perception Models. We assume that, to move from Z t - 1 to Z t , the robot per- 
forms three independent actions: a first rotation , bit, to face the direction of the 
translation, a translation , dt, from Z t -\ to Z t , and a final rotation, 6 2t , to face 
the orientation of the robot at time t. These motions are shown in Figure 2b). 
The elements of U t correspond to the observed counterparts of 6 lit , & 2 ,t and d t , 
respectively. 

Denoting by Z the matrix containing all locations from time 1 to T, and 
■Z f_1 the vector of true locations up to time t — 1, we can write 

T T 

P(Z\U) = Y[P(Z t \U,Z t ~ 1 ) = n P(Zt I U t ,z t _ i). (1) 

i=l i=l 

The term P(Z t \ Ut , Z t - 1 ) in the last product in equation (1) is known as the 
Motion Model. Here we assume a Gaussian Motion Model, which determines 
that the error of bit, 6 2t and dt with respect to their observed counterparts, 
correspond to white noise with variances that are proportional to these latter 
quantities. 

Let 6 be the matrix of all distances to obstacles up to time T and P(V \ 0) 
be the conditional distribution of laser readings given distances to obstacles. 
Assuming that laser beams read distances independently from each other and 
also that these laser readings are independent over time, we write 
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T N 

P(V\9) = l[l[P(V u \e ti ), (2) 

t= 1 i— 1 

where N represents the number of laser beams, and Vu and 9 a represent the laser 
reading and distance to the closest obstacle, respectively, in the i-tli direction 
at time t. The term P(Vb|0 t >) is known as the Perception Model. We assume 
that it corresponds to a truncated Gaussian distribution with mean 9 t i and with 
known variance a 2 determined by the accuracy of the laser sensor. The Gaussian 
distribution is truncated by 0 at the lower end and the maximum range of the 
laser at the other. 

Finally, we consider that the prior distribution of 9u corresponds to a geo- 
metric distribution, such that P{9u) = (1 — p, where p corresponds to 

the prior proportion of busy cells in the map. In this work we determine this 
value empirically by analyzing the data. 



3.2 Importance Sampling 

In order to find an expression for the posterior distribution of M given U and 
V we must integrate over pairs (Z,6), as seen in Figure 2a). Alternatively, we 
note that the map M is fully determined by Z and 0, thus we are interested in 
the posterior distribution P(Z, 6\U, V). In the absence of a closed form for this 
expression, we use Importance Sampling (IS) to sample observations from it. 

Importance Sampling [7] [17] is a sampling algorithm used to estimate proba- 
bility distributions. It has been heavily used in recent years. The main idea in IS 
is to represent a distribution by a set of observations and a weight associated to 
each observation in the set. In this way, expected values and other features of the 
target distribution can be estimated as the weighted average of the observations. 

Consider a set of n tuples (xj.ujj).j = 1,2 , . . . , n, given by random draws x'jS 
from a distribution g and corresponding weights w's. Liu and Chen [12] define 
this set to be properly weighted with respect to the distribution ir if 



lim 

n—> oo 



E"=i h{xj)u}j 

E fi 

j = i u o 



E n{h{X)), 



for any integrable function h. This means that if xi, X 2 , ■ ■ ■ , x n are a sample 
from g, the set of weights uij(xj) = n(xj)/g(xj ) properly weights the sample 
with respect to n. 



3.3 Sampling from the Posterior Distribution of Maps 

Our IS approach relies on the factorization of P(Z,0\U,V) given by 
P(9\U, V) P(Z\0 , U, V). This decomposition suggests sampling 0 from its pos- 
terior distribution, P(0\U ,V), first, and sampling Z from P(Z\0,U,V) after- 
wards. 

In the case of P(6\U,V), we approximate this distribution by the 
product of the Perception Model and the geometric prior distribution of 0. Thus, 
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according to the model described above, we sample values of 9u from a trun- 
cated Gaussian distribution centered at the corresponding laser reading V t i and 
standard deviation er, and then we associate to each sample a weight given by 
= (1 — p) eti_1 p. This set of weights properly weight the set of samples, 
with respect to the posterior distribution of 9. 

Importance sampling plays, again, a key role when sampling locations from 
p(z \e,u,v). Using the properties of the model described before, we can show 
that (see [2] for details) 



T 

p(z\e,u,v) <k'[[p(z t \Ut,z t - 1 ) p(9 t \z t ,z t -\e t ~ 1 ). (3) 

4=1 

Equation (3) shows that, at each point in time, we can sample Z t from the 
Motion Model and associate a weight 

e>{Z t ) = P{p t \Z u Z t - 1 ,6 t - 1 ) 

N 

= l[p(6u\z t ,z t - 1 ,e t - 1 ), (4) 

i = 1 

to this observation. The term P(0 t \Z t , Z*~ , ) corresponds to a truncated 

geometric distribution that represents the degree of agreement between true 
distances to obstacles at time t, 9 t , and the fact that the robot is at the sampled 
location, Z t , within the map built from Z 4-1 and 0 t_1 . 



4 Results 

In this section we show an application of the algorithm described in previous 
section, to a data set obtained in Wean Hall building at Carnegie Mellon Uni- 
versity. The data set was collected by a robot equipped with a laser sensor and 
an odometer 3 . 

The robot navigated going back and forth along a hallway. In that journey 
3354 measurements were taken, each of them consisting in a pair of odometer 
reading differences and laser readings. The laser sensor sends beams every degree 
spanning an angle of 180°. Thus, there are N = 180 distances recorded for each 
laser reading. A map drawn from raw data is shown in Figure 1. The figure shows 
how odometry error accumulates so that it seems that the robot has visited two 
different corridors, instead of just one, as it did. The smoothness of the depicted 
walls, however, suggests that error in laser sensor readings is small, compared to 
error in odometer readings. 

We sampled distances to obstacles based on the Perception Model, with a = 
0.02to, and sampled locations afterwards, using IS with sample size n = 100. 
Figure 3 shows a path obtained from a set of sampled locations, along with 



3 Data is a courtesy of Nicholas Roy. 
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Fig. 3. Path of the robot in Raw data (line) and Sampled Path (dotted) 




Fig. 4. Average map using the proposed Algorithm 

the path obtained from raw odometer readings. The figure shows that sampling 
using IS allows the robot to recover from odometry errors. Figure 4 shows the 
average map of the sample. 

5 Conclusions 

This paper adds to the research in Statistics and Probabilistic Robotics, propos- 
ing a complete probabilistic representation of the SLAM problem and obtaining 
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a full Bayesian solution. In particular, it contributes with a new algorithm to 
sample from the posterior distribution of maps. An intermediate step of the al- 
gorithm provides observations from the posterior distribution of locations, that 
is, we solve the localization problem at the same time. 

This paper formalizes the problem of mapping as the problem of learning the 
posterior distribution of the map given the data. We work on an expression for 
this distribution and show that there is no closed form for it. Thus, we propose 
an algorithm based on Importance Sampling for obtaining a sample from the 
target posterior distribution. 

Important Sampling showed to be a computational efficient way to explore 
the posterior distribution of the map. In addition to this, IS provided with an 
effective methodology to correct odometry error accumulated over time. 

Although we do not have ground truth data to quantify the accuracy of the 
algorithm, the average of the resulting map samples closely resembles the real 
map of the environment. In the same way, samples from the robot trajectory 
resemble the true path followed by the robot. 
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Abstract. A new method to detect 3D Obstacles using a stereo vision 
system and a 2D laser range finder is presented. Laser range finder mea- 
sures distance to obstacles, but only on a plane parallel to the floor; and 
stereo vision is not able to estimate distances to surfaces with little or 
no texture at all. This paper explores a form to take advantages of both 
kind of sensors. The main idea is to project 3D points detected by the 
laser telemeter in the form of initial values for pixel disparities into a 
trinocular stereo vision system. Experimental tests using a mobile robot 
in a indoor environment showed promising results. 



1 Introduction 

Most robotic tasks require to detect obstacles around the robot. Recent tech- 
niques in obstacle detection and mapping using single-plain laser range finders on 
mobile robots have been successfully applied in indoor environments [6]. Stereo 
vision systems also have been applied to 2D [5] and 3D mapping [1]. The dis- 
advantage of a 2-D laser range finder is that only detects obstacles in a plane 
parallel to the floor and its behavior depends upon the reflectivity of objetcs. 
Stereo vision also have disadvantages. A successful stereo system must solve the 
correspondence and the reconstruction problem [7]. The correspondence problem 
is to find which parts of the images taken from the cameras are projections of the 
same scene element. The reconstruction problem involves to compute the 3-D 
location and structure of the observed objects. Sometimes the correspondence 
problem can not be solved or it is very difficult problem. For example, large 
homogeneous regions are hard elements to find its correspondence. To overcome 
this problem, some approaches process only vertical lines or edges [5]. 

However, it is often useful in mobile robotics to fuse data from multiple, pos- 
sible heterogeneous sensors. A fusion of sensor data can happen at the data, the 
feature, or the decision level [3]. Data fusion occurs when the raw data from 
various sensors are combined without significant post-processing. Feature fusion 
occurs when features are extracted from data at the sensor level before combina- 
tion. Fusion at the decision level requires that the fusser algorithm accumulate 
data at the sensor level before fusing. An algorithm to fuse, at the feature and 
data level, range data from stereo vision and lidar sensor is described in [3] . They 
use an occupancy grid map (the environment is divided in regular regions and 
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each region has an occupancy status) to fuse data from sensors. Other work fuse 
data from stereo vision and sonars [5] mixing maps obtained separately by each 
sensor. 

The goal of the present work is to combine, in a different way, a 2-D laser 
range finder into a trinocular stereo vision system to form a more robust 3- 
D obstacle detection system. The key idea is to exploit the range information 
captured by the 2-D laser range in order to solve some ambiguities of the stereo 
vision system and get a better disparity map [7], and also to speed up stereo 
calculations. 

This paper introduces a variant of the Slrirai’s Algorithm (described in [2]), 
a correlation method using variable window, where laser measurements are in- 
cluded in the algorithm in the form of initial disparity values. In this way some 
of the problems of stereo vision are solved (e.g. Homogeneous surfaces). To build 
a disparity map, images taken from cameras are rectified using the calibration 
method described in [4]. Using rectified images, the search for correspondences 
is reduced to one dimension. 

The rest of the paper is organized as follows. Section 2 shows the image rec- 
tification process. Sections 3, 4 describes the mapping from laser measurements 
to image coordinates. Sections 5 and 6 shows the combination of the laser data 
into the stereo processing. Some experimental results using a mobile robot are 
shown in section 7. 



2 Stereo Rectification 

In a standard stereo system where the cameras have parallel optical axis and 
their image planes are orthogonal to the optical axis, the correspondence search 
process is reduced to a search within the same pixel row. The process of convert- 
ing images taken with a given geometry to those corresponding to a standard 
stereo system is called rectification [7]. 

In [4] a method based in an image register technique is used to accomplish 
the distortion correction. The method matches two images: a distorted image 
acquired by the camera and a calibrated pattern without distortion. This process 
applied to several cameras can (having certain mechanical considerations) built 
a standard stereo system, given that the cameras end up capturing the same 
calibration pattern. 

The rectification process we use is the following: 

1. Mechanically approximate the stereo geometry to a standard geometry. In 
Figure 1(a) the stereo geometry of the cameras is shown. A strategy that 
worked for us is putting a pattern of three dots distributed in a way that 
corresponds to the stereo geometry (see Figure 1(b)), and then mechani- 
cally align the geometrical center of the cameras with their counter parting 
reference dot. 

2. Put the calibration pattern over a plane 77 that is orthogonal to the pro- 
jection axis of the cameras in an initial position such that it is aligned and 
centered with the left camera and acquire the image. 
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(a) Stereo Geometry 



(b) Dot pattern 



Fig. 1. Process of image rectification 



3. Slide dX the calibration pattern over the II plane and acquire a second 
image with the topmost camera. 

4. Slide dY the calibration pattern from its initial position and acquire a third 
image with the right camera. 

5. Finally, apply the calibration method of [4] using the same calibration pat- 
tern (see Fig. 5(a)) and each acquired image. 

The process generates three correction files, one for each camera, that allows 
us to convert the images to their corresponding corrected and rectified images 
(see Figure 2 for an example of the left and right cameras). 

3 Integrating the Laser Telemeter with the Stereo 
Geometry 

Using a reference frame centered on the left camera, as shown in Fig. 1(a), the 
laser plane is mechanically aligned as shown in Fig. 3. It is straightforward to 
map points measured by the laser sensor to the reference frame. 




Fig. 2. Rectified left and right images 
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Fig. 3. Integrating the laser with the stereo geometry 



4 Mapping Laser Reads to Images 



Given that the lens distortion has been corrected and the images are rectified we 
can use the perspective model ( pinhole model) [7], which allows us to convert 3D 
coordinates to image coordinates. The perspective model assumes no distortion, 
so the coordinates (xi,yi) of a laser point or measurement with coordinates 
(Xi, Yi, Zi), within the left image are given by: 



Xi = f 



W 

Zi ’ 



where / is the focal distance. 



Vi = f 



Yi 

Zi 



5 Laser Data Propagation 

Information obtained from the laser is going to be used as disparity seeds. Dis- 
parity di is obtained by [7] di = (see Fig. 1 (a) for dY). Our main goal 

is for these seeds to transmit their disparity to their neighbors and so on. Yet, 
each pixel in the neighborhood of a pixel with a disparity value should adopt 
such a disparity, only if the pixel pass the following tests, 

— Texture Test - Pixel texture should be similar to the texture of their neigh- 
bors. 

— Horizontal Correspondence Test - We try to find a match between the given 
pixel (in the left image) and its respective pixel in the right image, by using 
the left and the right images, as well as the disparity information di stored 
in the neighboring pixel. 

— Vertical Correspondence Test - It is similar to the previous step, but now we 
use the information from the top and bottom images. 
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Each pixel inherits the disparity pixel of its neighbor just if it has passed 
the previous tests. To expand the disparity seeds through the whole image we 
apply a breadth first search. At the beginning only pixels with disparity values 
computed by the laser data are in the list of nodes to explore. Each node taken 
from the list is expanded to its eight neighboring pixels. 



6 Correlation Method 

In order to complement our propagation method, we use Shirai’s approach [7], 
a correlation method that assesses disparity between detected borders within 
stereo images. This algorithm is based on the fact that detected borders are more 
suitable to match than other homogeneous areas. Furthermore, this criterion 
decreases computational cost. Shirai’s algorithm uses dynamical-length windows. 
Hence, narrow and wide windows are used in large and shorter searching regions, 
respectively. Image gradient is used as a border detector, which is given by, 



l™<*'»)l = /(^> 2 + ( ^> J 9) 

A border has been detected if X7I(x,y) > T v , where T v is a previously 
defined threshold. 

The following proposed criteria, within the Shirai’s algorithm, aim to find 
the right correspondence. 

— All the matches detected with Shirai’s algorithm are stored. 

— The mean /i error and standard deviation cr are obtained from the previous 
stored matches. 

— If a minimum match is located far enough from the other ones, say 2<r at 
least, it is considered to be a suitable correspondence. 

A similar process is carried out to find suitable correspondences with the 
third image. 

Disparity maps are built in a excluding way by using the proposed prop- 
agation described in the previous section and the modified Shirai’s method. 
Hence, the propagation method is used within homogeneous regions, and Shi- 
rai’s method is activated within detected borders regions. 



7 Experimental Results 

Experiments were performed by using three a mobile robot with three Firewire 
cameras (resolution 640 x 480 with 2.1mm lens) and a SICK Laser Measurement 
System model LMS209-S02. The laser sensor covers a 180° range with a 0.5° 
resolution and detects a maximum distance of 32?n. Since our method requires 
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(a) Original image (b) Shirai’s method (c) Proposed method 

Fig. 4. Results obtained from stereo images and laser data 
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(a) Calibration pattern (b) Distorted image (c) Corrected image 



Fig. 5. Correcting image distortion by using a register technique 



non-distorted images, we first apply an extension 1 of the algorithm shown in 
[4]. Figure 5 draws an example of correcting the image distortion produced 
by the cameras. The synthetic calibration pattern is shown in Fig. 5(a), 
the image captured by the camera is shown in Fig. 5(b), and the image 
obtained from calibration process is shown in 5(c), which shows a very good 
result. 

Figure 4 shows an application of the proposed method. Figure 4(b) shows 
the Shirai’s method response when a maximum window length of 7 pixels, and 
a searching region of 80 pixels is used. In this case, initial parameter Tv = 16 
and 2a were used. 

The disparity map obtained from mixing both of the previous methods, corre- 
lation and propagations is shown in Fig. 4(c). In this case, the similarity threshold 
among neighbors was 16 gray tones. It is possible to observe that this process 
better detects homogeneous regions. 



1 The original approach uses fci and parameters to model distortion, which was not 
enough to correct distortion of lens with 2.1 mm focal distance. We include another 
parameter k$. 
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8 Conclusions 

We have presented a method that combines a laser range finder and a stereo 
vision system to build better disparity maps. Fusing only laser data and stereo 
range data does not include information from homogeneous surfaces above or 
below the laser plane. 

The proposed approach takes advantage of including the laser data into the 
trinocular stereo vision system and works fine with homogeneous surfaces. More 
tests are going to be done as well as developing a probabilistic propagation 
instead of the deterministic propagation described. The idea is to include con- 
ditional probabilities of disparities given by the laser sensor, the stereo vision 
system and disparities of neighboring pixels. 
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Abstract. Robotic mapping is one of the most important requirements for a 
truly autonomous mobile robot. Mobile robots should be able to building ab- 
stract representation of the physical environment, in order to navigate and work 
in such environment. This paper presents an adaptive way to make such repre- 
sentation. The proposed system allows the robot to explore all the environment 
and acquire the information incoming from the sensors (presence or absence of 
obstacles) while it travels. The robot may start the mapping process at any point 
of the space to be mapped. Due to the adaptability of the chosen method, the 
process has the capability of dynamically increase the memory requirements 
according to the already mapped area, even without any a priori knowledge of 
the environment. 

Keywords: Adaptive Automata, Robotics, Navigation and Robotic Mapping. 



1 Introduction 

Early approaches for allowing mobile robots to move around used to employ a pre- 
liminary map of the environment stored in its memory. Those approaches do not 
provide an adequate solution since storing a complete geometrical map of the 
environment, searching the database for localization and the path planning process 
significantly increases the computational complexity of the system, making the ap- 
proaches prohibitive for actual implementations [2], 

Another problem related to those approaches refers to the repetitive nature of non- 
automatic mapping processes. Each unstructured environment in which the robot is 
intend to work, such as, buildings, offices, industries and agricultural fields, has to be 
mapped first and the resulting map must be manually registered in its memory. 

There is also a question related to the possibility of robots to work in hazardous 
and unknown environments. Dangerous tasks like mining, undersea operations, work- 
ing in disaster areas, space and planetary exploration are examples of situations in 
which robots may have to map the field before being able to work properly. 
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Stimulated by such reasons, robotic mapping has been a strongly researched topic 
in robotics and artificial intelligence for two decades, and still presenting challenging 
research subjects, e. g. mapping dynamic or large areas [9]. 

The present work proposes an adaptive mechanism to steer a robot to cover all its 
unknown environment and. during this exploration, to collect information from its 
sensors and to organize them as a map. This map is built in such a way that robot’s 
navigation become easier. 

2 Related Work 

Since the early 80’s robotic mapping research area has been split between metric and 
topological approaches. Metric maps represent the environment by using its geomet- 
ric properties [4] [8]. Topological maps describe environments as a set of important 
places, which are connected by arcs [2] [6]. These arcs have attached information on 
how to navigate through such places. Nevertheless, the exact frontier between these 
approaches has always been fuzzy, since topological maps rely on geometric informa- 
tion about the world [9]. 

Adaptive devices change their structure and behavior according to their external 
stimulus. Such feature represents an intuitive and trustful way for modeling physical 
environments and to conduct the robot, despite the complexity of the environment. 

2.1 Adaptive Automata 

Adaptive automata, first proposed in [7], extends the concept of finite automata by 
incorporating the feature of performing dynamic self-reconfiguration in response to 
externally collected information. Such behavior provides adaptive automata with 
learning capability, which makes them suitable for representing knowledge. 

It has been shown that adaptive automata are Turing-powerful devices [7] and 
they have also been applied on several applications, such as pattern recognition [3] 
and systems description [1], 

Adaptive automata may be viewed as self-modifying state machines whose struc- 
ture includes a set of states and a set of transitions interconnecting such states. States 
may be classified in: initial state; a set of final states; and a set of intermediate states. 
Incoming stimuli change the internal state of the machine. 

The self-modifying feature of adaptive automata is due to the capability it has of 
changing its own set of transition rules. Adaptive actions may be attached to the tran- 
sitions which are able to either add new states and transitions or remove already exis- 
tents ones, consequently, achieving a new structure. Hence, incoming stimuli may 
change the set of internal states and modify the general configuration of the automa- 
ton. See [7] for details on concepts and notation. 

Transition rules in adaptive automata are represented as: 

(g,e,a):B— >(g’,e’,a’):A 

g: push-down store contents before the transition; 

g’: push-down store contents after the transition; 
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e: current state before the transition; 
e’ : current state after the transition; 
a: input stimulus before the transition; 
a’ : input stimulus after the transition; 

B: adaptive action before applying the transition; 

A: adaptive action after applying the transition. 

Adaptive actions A and B are both optional. Three different elementary adaptive 
actions are allowed: inspection - search the current state set for a given transition; 
deletion - erase a given transition from the current state set; and insertion - add a 
given transition to the current set of states. Such actions are denoted by preceding the 
desired transition by the signs ?, - and +, respectively. Figure 1 shows a graphic 
representation of the transition. 



W B. , .A ^ 

Fig. 1. Simplified transition ( e , a ) : B — » e’ : A when g, g’ and a’ are omitted 



2.2 The Mapping Automaton 

The initial work on representing physical environments by using adaptive automata 
has been proposed in [5]. The present paper extends this work. 

The adaptive mapping automaton starts from a square lattice (figure 2a) consisting 
of nine states connected by special transitions, all of them denoting areas to be 
mapped. The central state is the initial state of the automaton, and represents the start- 
ing point of the exploring path. 

In order to allow a clear presentation of the method, a graphic representation of 
the adaptive automaton will be used (The dot-marked state corresponds to the actual 
position of the robot and single lines represent areas not yet mapped.). 

Initial automaton also presents special tags (X), marking corner states, and special 
transitions, which are provided for supporting expansions in the lattice, as shown in 
figure 2b. 

The automaton properly replaces the four adjacent non-filled transitions according 
to the data information collected by the robot’s sensors while it performs the explor- 
ing moves. The information collected by the sensors contains indications on the direc- 
tion - north, south, east or west - and the condition of the place - free or busy. 

Figure 2c shows one possible example of the four-data information collected by 
sensors and stored by the automaton. Double arrows indicate non-obstructed areas 
and bold lines denote obstructed ways. 

In order to exemplify a complete map, figure 3a illustrates the representation of 
the information acquired by the automaton after exploring a simple room. The dot- 
marked state shows that the robot completed the exploration at the rightmost upper 
space of the room. 
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Fig. 2. (a) Initial lattice in an adaptive automaton, (b) Special tags and transitions in initial 
automaton, (c) Example of information coming from the sensors: two directions obstructed and 
two free directions 



To keep the relation between the part already built of the map and the real envi- 
ronment, the initial state of the automaton is adopted as representing the origin of the 
map. This state corresponds to the initial mapping place. So, any point in the map is 
associated to any point in the physical environment through the association of each 
transition in the map’s representation to some corresponding displacement performed 
by the robot in the real world. 

Note that in actual applications the automaton is represented in the algebraic way 
(as described in section 2.1). 

This mapping process allows dynamic memory occupation usage according to the 
amount of already mapped area. This feature contrasts with classic approaches, such 
as those described in [4], [6], [10]. 

3 The Model 

The proposed model to perform robotic mapping and exploring motion by using 
adaptive automata is depicted in figure 4. 

In this proposal, an information management system supplies the exploring- 
motion automaton with data collected from the sensors, and the current neighborhood 
information previously modeled in the mapping automaton. Data collected by the 
sensors contain information on the direction (north, south, east or west) and condition 
(occupied or free). Data from the map contain information on the two adjacent states 
in the four directions and their condition (free, occupied, not mapped or not created 
state). Figure 3b illustrates an example of information from the map representing the 
two south states in an occupied condition and all other states in a free condition. 

The output information generated by this automaton indicates in which direction 
the robot is going to move. According to the configuration of the environment, the 
exploring-motion automaton also presents in its output a landmark information for 
future assistance to the navigation. 

The management system supplies the mapping automaton with information col- 
lected from sensors followed by the direction information and the landmark - in case 
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of occurrence. The mapping subsystem is responsible to store all the sensor informa- 
tion on the presence or absence of obstacles close to the robot and the navigation 
auxiliary landmarks. The motion decision is also transferred to the motion system that 
controls the robot’s motors. 





Fig. 4. System model 



Note that the environment, sensors and motions have been simulated in order to 
validate the proposed map building mechanism. 
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4 The Exploring-Motion Automaton 

As described in section 4, an adaptive automaton is used for determining the robot’s 
next move. For this purpose, it is supplied with information collected by the sensors 
and with neighborhood information previously registered in the map. Its operation 
allows the robot to cover the whole environment by describing a zigzag path. This 
section illustrates some details of this procedure through the following example: 

a. Starting at any point of the environment, the automaton leads the robot to north 
direction until it finds an obstacle (Figure 6a). 

b. The exploring-motion automaton conducts the robot from north to south until it 
finds an obstacle. Then, it turns around and comes back in a parallel path, describ- 
ing a zig-zag, which grows up to east (Figure 6b). 

c. After each step towards east and before the robot comes back in the parallel path, 
the automaton searches for a free space “behind” the robot, filling this space (in 
case of occurrence), and then returning to its zig-zag path (Figure 6c). 

d. When this east-growing zig-zag path is exhausted, the robot returns backwards by 
the same trajectory searching for a sequence of “free - not mapped” adjacent 
states at east or west (for this purpose the automaton uses information extracted 
from the map as shown in figure 3b). Such sequence means that there is a non- 
mapped space in that this direction and the way is free to reach it. Note that such 
space may be a simple place or another environment as complex as the already 
mapped one (Figure 6d). 

e. When such new space is located at the eastern side of the robot, it explores this 
space by performing the zig-zag path, as described in step ‘b’ (Figure 6e). 

f. When such new space is located at the western side of the robot, it explores this 
space by performing the zig-zag path, but grows to the west only after assuring 
that the sequence of “free - not mapped” adjacent states still exist at this direction. 
A sequence of “free - free” adjacent states means that this space has already been 
mapped and the automaton leads the robot backwards, as described in step 'd' 
(Figure 6f). 

g. When the robot returns to its initial point of exploration, the automaton leads it to 
the same sequence described above, changing ‘east’ to ‘west’, i. e., the zig-zag 
path grows to west until exhausted, and displacements to the east are conditioned 
to occurrence of a sequence of “free - non-mapped” adjacent states at this direc- 
tion (Figure 6g). 

h. When the robot returns again to its initial point of exploration, the automaton 
signs that the environment is entirely explored and full (Figure 6h). 

The zig-zag path has been chosen by its generalist features and because the envi- 
ronment is completely unknown. Some approaches divide the environment in rectan- 
gles during the exploration. In fact, a zig-zag path may be interpreted as a rectangle 
with unitary side and whose inside is completely known, which is an important fea- 
ture in the mapping process. 
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When a new space is found during the exploration it is entirely explored before 
proceeding. Such a depth search has been chosen because breadth search implies 
increasing the actual motion of the robot, then, exploring an entire branch before 
moving to another one is usually cheaper. 

While the environment is explored, the exploring-motion automaton may sign to 
the mapping automaton some special states, or landmarks, which are properly marked 
on the map. During the navigation process such landmarks are helpful for plan a 
trajectory from some initial position to a target position. The system calculates the 
path between such landmarks and, during navigation, it must find which landmarks 
are nearest to the initial and to the target positions [12], [13]. Then, those landmarks 
may be viewed as sub-goals in the navigation process. In order to implement such 
sub-goals, the exploring-motion automaton searches obstacles to the east or to the 
west during the zig-zag. If an obstacle is detected (a wall for instance) the proposed 
automaton marks the central state on the free space before and/or after the wall and, 
during the return trail, it signs to the mapping automaton that this state is a sub-goal 
(Figure 6i). 
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Fig. 6. (a) Initial north move, (b) East-growing zig-zag path, (c) Space filling before complete 
zig-zag path, (d) Exhausted east-growing zig-zag path, (e) East new space found during the 
return move, (f) West new space found during the return move, (g) Exhausted west-growing 
zig-zag path, (h) Environment entire explored, (i) Landmarks defined for the environment- 
example 



The present proposal allows the robot to cover all environment despite to its com- 
plexity and to start at any point of the space to be mapped. These features are advan- 
tages if compared to other approaches (such as presented in [11]). 
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4.1 Description of the Automaton 

For short, the description of the motion of the exploring automaton is represented in 
table 2, which expresses the relation between the system’s situations of exploration 
(directions) and the incoming sensor information and the information encoded in the 
map. Table 1 shows the encoding of table 2. 

Table 1. Encoding of table 2 
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Direction 
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Direction 5: North complementary east-growing 






Direction 6: North west-growing 
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Direction 7: North complementary west-growing 






Direction 8: South west-growing 




t : free 




Direction 9: South complementary west-growing 
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Direction 10: Return from east 










Direction 1 1 : Return from west 









Table 2. Situations of exploration versus the incoming information from sensors and map 
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Table 2 ( continuation ) 
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5 Conclusion and Future Work 

Robotic mapping is an essential feature to allow robots to complete certain tasks in 
unstructured and unknown environments. This work has shown an alternative to the 
classic mapping approaches: adaptive algorithms provide a new way to build maps 
and conduct the robot for unknown environments, covering all space. During the 
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exploration, sensors attached to the robot scan the environment for the presence or 
absence of close obstacles, and such information is collected into the model by ena- 
bling the automaton to perform appropriate self-modifications. The exploring-motion 
automaton also provides special landmarks to the map, which may be used as sub- 
goals on further navigation. The present proposal has the advantage of allowing the 
robot to explore complex environments without a priori knowledge of the place and 
the advantage of memory space usage increasing with the actually mapped area. 

Future works should approach the navigation problem by using the landmarks and 
the map created. 
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Abstract. Functional magnetic resonance images (fMRI) were analyzed to in- 
vestigate the cortical regions involved in stereoscopic vision using red/green 
anaglyphs to present random dot stereograms. Two experiments were con- 
ducted both of which required high attentional demands. In the first experiment 
the subjects were instructed to follow the path of a square defined by depth and 
moving in the horizontal plane contrasted with a similar sized square defined 
by a slight difference in luminance. Three main regions were identified V3A, 
V3B and BA7. To test that the observed activations were not produced by the 
pursuit eye movements, a second experiment required the subjects to fixate 
whilst a shape was presented in different random orientations. Our results sug- 
gests that areas VI, V3A and precuneus are involved in stereo disparity proc- 
essing. We hypothesise that the activation of the V3B region was produced by 
the second order motion component induced by the spatio-temporal changes in 
disparity. 



1 Introduction 

Although many psychophysical studies have investigated how the human brain com- 
putes stereoscopic information, it is uncertain which cortical areas are involved in its 
implementation. Some electrophysiological studies in monkeys report the sensitivity 
of V 1 to absolute disparities, suggesting that this area could be a preliminary stage of 
processing for stereo information [1], MT/V5 in monkeys shows a columnar organi- 
sation tuned for disparity [2], MT/V5 in human brains has been widely reported as a 
motion sensitive area [3-5]. Given the similarity between the visual system of the 
monkey and the human, it is possible that V5 in human brains is involved in the 
processing of stereo information. 
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Recently, modern non-invasive neuroimaging techniques, eg PET and fMRI, 
have been used to explore the functional anatomy of stereoscopic vision in humans. 
The results of these studies suggest that stereo disparity processing maybe widespread 
over a network of cortical regions in the occipital and parietal lobes, including VI, 
V2, V3, V3A, V3B, [6-10]. However there is no general agreement about the cortical 
regions selective to stereo disparities or the specific role that each of these has in the 
perception of depth [11]. The main goal of the present study was to process functional 
magnetic resonance images to investigate the cortical regions sensitive to stereo dis- 
parities using a stereo stimulus that avoids adaptation and at the same time maximises 
the attentional demands. Under this principle, two experiments were developed using 
functional magnetic resonance imaging to identify stereo sensitive regions stimulated 
by random dot anaglyph stereograms. 

2 Materials and Methods 

Ten healthy subjects, nine right-handed and one left-handed volunteers (7 female, 3 
male) aged from 20 to 30 years participated in the first experiment and five healthy 
right-handed subjects (2 female, 3 male) aged from 20 to 30 years participated in the 
second experiment. All subjects gave informed written consent. The stereo acuity of 
the subjects was measured using a stereo vision test (RANDOT SO-002), all of them 
were below 40 sec of arc. The subjects were given a preliminary practice session 
outside the magnet to become familiar with the visual stimulation. 

2.1 Stimulus Presentation 

Subjects lay on their backs in the magnet. They wore red/green anaglyph glasses and 
looked via a mirror angled at ~45o from their visual axes at a back illuminated screen 
located just outside the magnet. The viewing distance was 2.4 m. Stimuli were pro- 
jected on to the screen using an EPSON (EMP-7300) projector driven by a 3G Mac 
running Psychophysics Tool Box ver. 2.44 [12, 13] under MATLAB ver. 5.3. Al- 
though the stimuli were displayed at a video frame rate of 60 Hz, the image was only 
updated on every 10th frame, producing an effective frame rate of 6 Hz. 

2.2 Data Acquisition 

Subjects were scanned in a 1.5 T whole-body MRI scanner (Eclips Marconi Systems) 
with BOLD contrast echo planar imaging (TR= 3s, TE= 40 ms, 128 xl28 voxel, 
voxel size 1.875 x 1.875 x 4 mm.). Thirty two slices covering the whole brain were 
acquired. 

2.3 Data Analysis 

The data was pre-processed and analysed using SPM99 (Welcome Department of 
Cognitive Neurology). The first five scans of each run were discarded to exclude 
magnet saturation artefacts. All volumes were slice timed, motion corrected and nor- 
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malised in the MNI (Montreal Neurological Institute) stereotaxic space. The data 
were smoothed using a 6 mm FWHM (full width at half maximum) isotropic Gaus- 
sian kernel. Data analysis was performed using a boxcar design matrix of the differ- 
ent conditions convolved with the hemodynamic response function. Specific effects 
were tested by applying the corresponding linear contrast to the parameters obtained 
applying General Linear Model (GLM) using the design matrix [14]. The statistical 
parametric maps (SPMs) were then interpreted in the standard way with reference to 
the probabilistic behaviour of Gaussian random fields [17]. The threshold adopted 
was P < 0.05 (corrected). Due to technical restrictions, it was not possible to develop 
retinotopic mapping to identify the visual areas activated in our studies, instead, the 
regions of activation identified through the statistical analysis were mapped to ana- 
tomical locations using as a reference their Talairach coordinates. The labels assigned 
to each region were given, matching the anatomical location reported by other 
authors. 

3 Experiment 1: Global Stereo Tracking 

Experiment 1 was designed to activate stereo sensitive regions by requiring the sub- 
jects to perform a task of global stereo tracking (GST). Random dot stereograms were 
used to define a square region moving horizontally in front of the background from 
left to right and vice versa. The subjects were instructed to perform pursuit eye 
movement to follow the path of the square with their eyes. The performance of the 
task depended on the maintenance of the perception of depth defined by stereo dis- 
parities. The paper which explains in detail this experiment was submitted for publi- 
cations in the Mexican International Conference in Computer Science 2004 (ENC04). 

3.1 Experiment Design 

Subjects were given a sequence of 4 scans (sequences) each lasting 5.15 min. (10 
epoch) with a 5 min. interscan interval to permit subjects to rest. One hundred image 
volumes were obtained in each sequence. Each condition lasted 30s. giving 10 mul- 
tislice volumes per condition (TR=3s.). Each scan consisted in alternating epochs in a 
boxcar configuration. A dummy condition of a blank screen was presented during the 
first 15s. (5 scans) of each run which were excluded to control for magnetic saturation 
effects. The display contained 1,024 dots (with radius 0.1 deg. and zero disparity) 
distributed over the screen (mean dot density 1.5 dot deg-2). The subjects were in- 
structed to fixate on the right superior corner of a square (5.23 deg. side long) mov- 
ing laterally across the screen (13 deg. field of view). 

The square was moved from left to right and vice versa at a constant speed (2.19 
deg. sec- 1), each time that the square reached one edge of the screen it changed its 
direction. Dynamic random noise was used in order to remove any motion cues intro- 
duced by the change in disparity [8]. Two modalities were used to define the square, 
each one representing an experimental condition: a) Two dimensional tracking (2D): 
The square was luminance defined, its luminance (8.56 cd/m2) was lower than the 
background (18 cd/m2). This condition was used as a base line, b) Three dimensional 
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tracking (3D): The square was depth defined (red/green anaglyph stereogram), pre- 
sented 0.3 deg at the front of the background (zero disparity). The square moved 
horizontally in the plane X/Y, not in plane Z (motion in depth). Stereo-sensitive re- 
gions were identified by comparing the activation during the 2D control condition 
with the neural activity during the 3D stereo tracking condition. Although we could 
not track the eye movements to monitor the subject’s performance of the task, we 
have no reason to believe that there was any difference between them in the two ex- 
perimental conditions. After both the training session and the scan session all the 
subjects reported they could easily track the path of the square in both the 2D and 3D 
conditions. 

3.2 Results: Global Stereo Tracking 

Stereo-sensitive regions were identified using a contrast that compared the activation 
produced by the moving square defined by depth with that produced by the moving 
square defined by luminance. The results show consistent activation in areas V3A, 
V3B and parietal cortex. Area V3A in the right hemisphere was activated by depth in 
7 of the 10 subjects. In two of these subjects there was bilateral activation. Area V3B 
in the right hemisphere was activated in 8 of the 10 subjects. In five of these there 
was bilateral activation. One subject showed activation only in left V3B. Area BA7 
(precuneus) in the right hemisphere was activated in 3 of the 10 subjects. Two sub- 
jects had left hemisphere activation. There was no bilateral activation. Activation in 
the right superior parietal area was found in 2 other subjects. It is important to clarify 
that reversing the contrast revealed no across subject consistent regions in which the 
activations of the 2D tracking condition were greater than the 3D stereo tracking (data 
not shown). Results of previous fMRI studies of eye movements have suggested the 
involvement of parietal areas in the control of pursuit and saccadic eye movements 
[17-18]. None report the involvement of V3A and V3B regions. In this study, de- 
spite the control in which the eye-tracking component was identical in the two condi- 
tions, we find an increased activation in the parietal cortex in the stereo tracking task 
in several of the subjects. This suggests some involvement of the parietal region in 
stereo-processing per se independent of eye movements. This will be examined fur- 
ther in Experiment 2. 

4 Experiment 2: Shape Discrimination from Stereopsis 

In order to avoid possible artefacts introduced by eye movements, in Experiment 2 
the stereo sensitive regions were explored using a task which performance depended 
on the continuous perception of stereo depth without requiring tracking. A pie graph 
(pacman) shape defined either by luminance (2D) or by depth (3D) was displayed at 
the centre of the screen. The figure changed to one of four possible positions every 
second. The subjects were instructed to fixate a central dot and press a button when 
they identified a certain position of the figure. This task provided a way to assess how 
well the subjects are performing the task. 
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4.1 Experiment Design 

The stimulation sequences used in this experiment were similar to those used in ex- 
periment 1. The display contained 4,761 dots (radius 0.04 deg.) distributed over the 
screen (mean dot density 7 dot deg-2). A pacman shape (radius 4.3 deg.) was dis- 
played at the centre of the screen. The pacman shape was presented in one of four 
possible positions (up, down, left, right) randomly changing every second. The 
change in position was constrained to prevent a shape being shown continuously for 
more than one second in any one position. The positions of the dots were changed 
between frames (in both conditions) to produce the effect of dynamic random 
noise [8] to remove any possibility of there being monocular shape cues in the stereo 
condition: 

The subjects were instructed to fixate a point (0.3 deg. of radius and zero dispar- 
ity) in the middle of the screen (circular field of view 13 deg.) and press a button with 
the right hand when the mouth of the pacman was in the up position. The response on 
the button box was recorded to assess the performance of the subjects. There were 
two modalities to define the pacman, each one represents one experimental condition, 
a) Luminance (2D): The pacman was luminance defined, its luminance (9.29 cd/m2) 
was lower than the background (17.7 cd/m2). Both the background and the pacman 
were placed in the same plane (zero disparity). 

This condition was used as a base line, b) Depth (3D): The pacman was depth de- 
fined (red/green anaglyph stereogram), and appeared in front (-0.076 deg. of dispar- 
ity) of the background (0.076 deg. of disparity). The pacman and the background 
were displayed in front and behind the fixation point respectively in order to remove 
possible shape cues introduced by the red/green stereoscopic pair of dots in the stereo 
condition. As in the previous experiment, the stereo-sensitive regions were identified 
using a contrast which subtracted the neural activity during the control condition from 
the neural activity during stereo condition. The performance of the task was assessed 
integrating all the occasions in which the subjects pressed the button at the right time 
(when the pacman was in up position). Then, the occurrences when the subjects 
pressed the button at wrong times (when the pacman was not in up position) were 
subtracted. 

4.2 Results: Shape Discrimination from Stereopsis 

The stereo-sensitive regions were identified by comparing the activation produced by 
the pacman defined by depth with that produced by the pacman defined by lumi- 
nance, Consistent with our expectations V3A, V3B and precuneus were activated. 
Activations were also found in striate cortex (VI). Area V3A was activated bilaterally 
in 3 of the 5 subjects, and in the right hemisphere of 2 subjects. Consistent with ex- 
periment 1, area V3B was activated bilaterally in 1 subject, and in the right hemi- 
sphere of 4 subjects. The precuneus was activated bilaterally in 1 subject and in the 
right hemisphere of 4 subjects. The VI region was activated bilaterally in 2 subjects, 
in the left hemisphere of 2 subjects and in the right hemisphere of 1 subject. Figure 1 
shows results from a representative subject, and Tables 1, 2, 3 and 4 shows the MNI 
coordinates of regions of activation for all the subjects. 
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Fig. 1 . Depth against Luminance. The statistical map shows the areas sensitive to stereoscopic 
information. The activation includes VI, V3A, V3B and precuneus 



Table 1 . Stereo sensitive region: V3A 



Subject 


Location 


Z-Score 


P corrected 


Cluster size 


1L 


-24. -98, 16 


5.30 


0.000 


1 


1R 


36, -88, 18 


5.29 


0.000 


1 


2R 


28, -78, 24 


3.11 


0.077 


1 


3R 


34, -90, 20 


4.67 


0.001 


47 


4L 


-10,-104,16 


3.77 


0.02 


2 


4R 


36,-90, 16 


5.99 


0.000 


1 


5L 


-22. -98, 18 


7.21 


0.000 


52 


5R 


34, -94, 12 


5.63 


0.000 


9 



Table 2. Stereo sensitive region: V3B 



Subject 


Location 


Z-Score 


P corrected 


Cluster size 


1R 


42. -82. 4 


(Inf) 


0.000 


149 


2R 


38.-90, 2 


3.96 


0.056 


39 


3R 


40. -80. 0 


4.58 


0.001 


21 


4L 


-26,-100,4 


7.59 


0.000 


28 


4R 


36, -94, 6 


(Inf) 


0.000 


103 


5L 


-36, -92,4 


5.47 


0.000 


7 


5R 


36, -88, -4 


6.55 


0.000 


21 
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Table 3. Stereo sensitive region: Precuneus (BA7) 



Subject 


Location 


Z-Score 


P corrected. 


Cluster size 


1R 


32, -72, 52 


7.06 


0.000 


i 


2L 


-26. -74. 46 


4.67 


0.003 


4 


2R 


28, -66, 54 


6.01 


0.000 


96 


3R 


14, -78, 52 


3.90 


0.015 


5 


4R 


28, -64, 54 


4.30 


0.003 


8 


5R 


14, -74, 54 


5.47 


0.000 


1 



Table 4. Stereo sensitive region: VI 



Subject 


Location 


Z-Score 


P corrected 


Cluster size 


1L 


-8, -102, -2 


(Inf) 


0.000 


236 


1R 


16, -100, 0 


(Inf) 


0.000 


149 


2L 


-8,-106, -2 


6.20 


0.000 


59 


3L 


-6,-104, -6 


5.27 


0.000 


55 


4L 


-14, 106, 0 


5.83 


0.000 


15 


4R 


16. -100. 6 


6.65 


0.000 


214 


5R 


10,-100,4 


5.06 


0.001 


1 



5 Discussion 

The results of our experiments reveal four main regions sensitive to stereoscopic 
depth: VI, V3A, V3B and precuneus. In contrast to the evidence from physiological 
experiments in monkey [2, 19] we did not find evidence that V5 was involved in the 
processing of stereoscopic information. The activation in V 1 observed in experiment 
2 is unlikely to be due to the slight differences in the illumination between the differ- 
ent conditions as this was similar in both experiments. It maybe due the difference in 
disparities used in the two experiments. In Experiment 1 the disparity was 0.3deg and 
in Experiment 2 it was smaller (±0.076 deg). Because VI is sensitive to a narrow 
range of near zero disparities [10, 20] this may account for this difference in the 
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results. Consistent with the results of other studies, we find in both experiments 1 and 
2 that V3A and precuneus showed sensitivity to stereo disparities [6, 9, 10, 21], It is 
unlikely that these activations are due either to different patterns of eye movements or 
to stronger attentional engagement. In Experiment 1 the control condition provided 
identical requirements for these parameters and similar areas of activation were found 
in Experiment 2 in which no eye movements were required to perform the task. 

To our knowledge only one other study has reported the V3B region as being sen- 
sitive to stereoscopic information [10]. Other fMRI studies have implicated V3B in 
motion processing. Orban et al [22-23] showed that this area is sensitive to kinetic 
boundaries (boundaries defined by motion differences) and Smith and Greenlee et.al 
showed it to be activated by second order motion. They use the term first order mo- 
tion to refer to the point to point changes in position over time idealised as a single 
point of light moving through space. Second order motion is used to refer to the per- 
ception of motion arising not from point to point temporal correspondence of lumi- 
nance, but of correspondence of higher order global properties, which in their study 
were produced by the contrast modulation of a static texture. 



Table 5. Comparative table of functional and anatomical profiles reported for the V3B region. 
The following tables compare the functional profiles and the average Talaraich coordinates of 
the V3B region found in the present experiments with those reported by other authors. The 
standard deviation from the mean is shown in brackets. El and E2 refers to the first and second 
experiments reported in this paper 



Ref. 




.V 


Z 


Sensitive to 


1 


-25 


-88 


-1 


Kinetic boundaries. 


2 


-28 


-94 


-4 


Kinetic contours, shape 




34 


-88 


0 


and motion. 


3 


±31 


-91 


0 


Kinetic boundaries. 


4 


±26 (8) 


-89 (8) 


-2 (8) 


Second order motion. 


5 


Not given 


by the 


author 


Stereopsis. 


El 


-27.3 (1.1) 


-98.6 (3.1) 


0(5.5) 


Stereopsis. 




36.2 (3.9) 


-90.75 (5.9) 


-0.7 (3.5) 




E2 


-31 (7.1) 


-96 (5.6) 


4(0) 


Stereopsis. 




38.4 (2.6) 


-86.8 (5.7) 


1.6 (3.8) 





A possible explanation for the activation of V3B in our experiments is that it is 
due to motion cues introduced by moving shapes defined by coherent disparity in- 
formation. In Experiment 1 the task required the tracking of a shape over time and in 
Experiment 2 rotary motion occurred as orientation changed once a second. We pro- 
pose that activation in the V3B region might be involved in the segmentation process 
in three dimensional space, required by the global stereo tracking task and the shape 
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discrimination task. Other fMRI studies [21] investigating object shape perception 
have found that cortical regions (Lateral Occipital) close to V3B were involved in the 
analysis of object structure independent of the cues (luminance, colour, depth) that 
define the shape. Other than this we have no explanation as to why the Backus study 
which used a static depth discrimination task reported V3B as sensitive to stereo dis- 
parities. That study did not report the anatomical co-ordinates of the V3B region 
activated, so we were not able to compare its location with the V3B region reported 
here. Table 5 shows the functional and anatomical profiles reported for the V3B 
region. 

6 Conclusions 

Our results suggest a network involving the cortical areas VI, V3A, V3B and precu- 
neus in the processing of stereo disparities. The high proportion of activations located 
in the right hemisphere supports the notion of right cerebral dominance in stereo 
vision. Contrary to the results reported in studies with monkeys, our experiments did 
not reveal any evidence of the sensitivity of V5 to stereo disparity processing. Our 
results also showed a region in with functional profile and anatomical location which 
matched the V3B region. We have suggested that this region was activated by stereo- 
scopic motion component of the dynamic tasks used. Although both tasks used in the 
experiments (global stereo tracking and the form discrimination) produced motion of 
spatio-temporal changes of shapes defined by disparity, neither of them were de- 
signed specifically to optimally produce disparity based second order motion. The 
objective of our studies was to process functional magnetic resonance images to in- 
vestigate the cortical areas selective to stereoscopic information, and an experiment 
designed to test the hypothesis of the selectivity of the V3B region to stereoscopic 
motion is for future work. 
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Abstract. In this document, we propose a machine vision system to 
detect the predominant direction of motion of a Foucault pendulum. 
Given a certain configuration where the camera has a top view of the 
pendulum’s bob in motion, the system builds an adaptive model of the 
background. From it, the bob’s center of mass is computed. Then, an 
ellipse model is fitted to the trajectory. Finally, the noise in the observed 
predominant direction of motion is filtered out to get a robust estimate of 
its value. The system has proved to be quite reliable on a simple version 
of the Foucault pendulum where it was tested. 



1 Introduction 

A Foucault pendulum is an important device for a number of scientific and 
educational reasons which include: a) It can be used to show that the Earth 
spins; b) It can be used to provide accurate time-keeping; and, c) It can be used 
to measure g, the acceleration due to gravity. In 1851, French physicist Jean 
Bernard Leon Foucault proved that the Earth rotates by swinging a pendulum 
attached to a building [9]. Foucault observed the forces acting on the pendulum. 
Since there is not force making the pendulum rotate with respect to the floor, 
Foucault concluded that it must be the floor that rotates with respect to the 
pendulum. Currently, observations on the pendulum are done without the aid of 
automatic instruments. This is wearisome and error prone. To appreciate what 
it takes, consider Gillies’ review [6] of Maurice Allais observations of the Foucault 
pendulum during an eclipse in 1954. Guilles reports that in his experiment, Mau- 
rice Allais observed the pendulum every 14 minutes during 30 days and nights 
without missing a data point. In this paper, we propose a machine vision system 
to make these observations, and from them to study physical phenomenas. For 
example, during major eclipses around the world[6, 1, 5], it has been observed 
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(a) A bob, hanging from a support 

through a wire, swings back and (b) Camera’s top view of the experimental 
forth. A CCD camera is placed close setup, 
to the gripping system. 

Fig. 1. A machine vision system to compute the predominant direction of motion in a 
Foucault pendulum 



how implementations of the pendulum experimented a perturbation by describ- 
ing an ellipse whose major axis deviated in relation to the predicted motion 
plane. 

Current state of the art implementations of a Foucault pendulum are aimed 
to beat problems such as the precession due to ellipticity [2] ; and, the loss of 
amplitude due to the gripping method used to hold the wire to the building[10]. 
With these features, a Foucault pendulum can be used to keep track of time. 
For a pendulum in the north/south pole the floor under the pendulum would 
twist around the Earth’s axis every 24 hours. For a pendulum on the equator 
the floor would not twist at all but the building would travel eastward on the 
Earth’s axis. For places at different latitude some twisting and some travelling 
takes place. However, while the twisting can be seen the traveling motion of the 
pendulum eastwards can not. The degree of twist depends on the latitude </> by 
n = 360°sin(/>[9]. For instance, consider a point in Mexico city’s with latitude 
19 o 00'26 N. There, the pendulum will rotate about 117°14 50 in 24 hours. 

In this document, we present a vision system to measure the direction of 
motion of a Foucault pendulum. To that end, we grab images from a camera 
placed close to the gripping system, aligned with the vertical direction (see Fig. 
1). Our system tackles three different aspects of the problem. On the one hand, 
in §2, we describe how the bob can be detected under changes in the background 
scenario. Then, in §3, we compute the bob’s trajectory by fitting an ellipse to the 
set of observed positions. Finally, in §4, the noise in the estimated predominant 
direction of motion is filtered out using a Kalman filter. 
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uncovered 




Fig. 2. State diagram for detecting the Foucault pendulum bob (see the text for details) 



2 Bob Detection 

Background subtraction is a technique commonly used to detect moving objects 
in a sequence of images. It has the advantage that it is easy to compute. How- 
ever, due to dynamic changes in the scene being observed, in most cases an 
adaptive background construction model needs to be implemented for long term 
observation. Nowadays, a number of adaptive techniques have been proposed. 
Current solution models show a clear trend. Most of them model the pixel’s in- 
tensity value dynamics with a set of statistical models. For instance, Stauffer and 
Grimson[12] use a mixture of gaussian models. For a given observation, the most 
suitable model is used. Monnet et al. [8] use Principal Component Analysis to 
create a model of the variability of the pixel dynamics over a sequence of frames. 
The principal components are computed on the covariance matrix built from 
the pixel intensity, over the previous m observations. Others, like Davis et al. [3], 
make no prior assumption about the underlying statistical model until enough 
evidence is accumulated. It has been argued that Kalman-based approaches [11] 
are robust but respond slowly to changes. For this work, we implemented an 
adaptive background construction model based on several Kalman filters for a 
given pixel. A particular filter is used depending on how well it describes the cur- 
rent pixel process status. Otherwise, a new filter is initiated. The model has the 
advantage of responding quickly to changes in illumination conditions because 
it retains a certain extent of memory about historic background tendencies. 

Kalman filters address the problem of estimating the state x £ lZ n of a 
discrete time controlled process[7]. The dynamics of a linear stochastic process 
can be described for iteration k by the linear stochastic differential equation [14] 

— Ak^-k\k T Bf~\l k (1) 

with a measurement Zj, = Here u is the input, and x is the state; A is the 

state propagation matrix, B is the input matrix, and is the output matrix. 
The a priori estimated error covariance is given by [13] 

Pk+l\k = ^kPk\k^k + Qk 



(2) 
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where Qk is the process noise covariance. At time k, two pieces of data are 
available. One is the estimate of the state x k given measurements up to 

but not including z k . 



^ k\k — k—\ “1” K k (Zk Hk^-k\k—l) (3) 

The error covariance matrix Pk\k is predicted from the previous observations 
Pk\k-i as Pk\k = (/ — KkHk)Pk\k-\- The second piece of data is the gain or 
blending factor Kk that minimizes the a posteriori error covariance and that is 
given by 

K k = P k \ k Hl {H k P k]k Hl + Rk)- 1 (4) 

where R k is the measurement noise covariance. 

Up to two filters, F = {/i, /a } , are defined for each image pixel p. How many 
and which one is used depends on how well the pixel dynamics is modeled by 
the filter (see Fig. 2). At a particular point in time a pixel is either part of the 
background or the foreground. For a given filter, it is said to be locked when its 
error covariance P^ fc converges. Otherwise, it is said to be unlocked. A pixel is 
said to be covered when its intensity value is within a certain number a of error 
covariances P k \ k from the current state estimate value x k . Initially, all pixels 
are part of the foreground. As the time pass by, most of the filters are locked. 
Then the pixels they are included into become part of the background. Some 
of the pixels remain unlocked. They keep being part of the foreground. When 
a pixel is part of the background, it may be that a pixel intensity value is not 
covered. This will start a new filter process and will file the pixel as being part 
of the foreground. This is the typical case of an object passing in front of a static 
object. When the moving object finally pass by the previous filter is still covering 
the intensity values and comes back to continue the process. 



3 Trajectory Description 

In general, due to loss of energy, the top view trajectory of the bob is described 
by an ellipse. The general form of a quadratic curve is given by 

F( a, x) = a T x = ax 2 + bxy + cy 2 + dx + ey + f = 0 (5) 

where a T = (a, 6, c, d, e, /), x = [x 2 , xy, y 2 ,x, y, 1], and (x, y) is a point in the im- 
age plane. The constraint for Eq. (5) to represent an ellipse is b 2 < 4 ac. Fitzgib- 
bon et al[ 4] , proposed a method to look for the ellipse’s parameters through the 
solution to the generalized eigenvalue problem of the system 

Su = A(7u (6) 

where S = D T D is a scattered matrix and C express the constraint. That 
is, D = (xi, . . . ,x n ) T is the design matrix and C is a 6 x 6 matrix with all 
entries zero but (7(3,1) = (7(1,3) = 2 and (7(2,2) = —1. Fitzgibbon et al. 
show that the solution to the system in Eq. (6) is always an ellipse for the 
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eigenvector corresponding to the only negative eigenvalue. Furthermore, Eq. (5) 
can be transformed into 



F( a',x') = a' T x' = a'x' 2 + b’x’y' + c'y 12 + d! x' + e'y' + /' = 0 (7) 

by virtue of the rotation x' = i?x, with R = ( C ° S ^ S ^ n ^ ) . The relationship 

y sin 9 cos 9 J 1 

between a and a! is 

a' = a cos 2 9 + b sin 9 cos 9 + c sin 2 9 
b' = 2(c — a) sin 9 cos 9 + b( cos 2 9 — sin 2 9) 



d = a sin 2 9 — b sin 9 cos 9 + c cos 2 9 
d! = d cos 9 + e sin 9 
e' = e cos 9 — d sin 9 

r = f 



(8) 



There is an angle 9 for which the term b' disappears. This is when the ellipse’s 
major axes are parallel to the reference system principal axes. In this case, the 
angle 9 is given by 



9 = 



5 arctan 



45° 



for a yf c 
otherwise 



(9) 



The ellipse’s center can be found at a = a" + (h, k) T . Replacing this value 
into Eq. (5) gives 



ax" 2 + bx"y” + cy" 2 + (2 ah + bk + d)x" + ( bh + 2 ck + e)y"+ 
(ah 2 + bhk + ck 2 + dh + ek + f) = 0 



(10) 



For the ellipse to be centered at the origin both coefficients getting along 
with x" and y" have to go to zero. This leads to the system 



2 a b 
b 2c 



( 11 ) 



This way Ec. (9) and (11) give us both the orientation and the center of the 
ellipse, respectively. 



4 Direction of Motion 

Earth rotation frequency ui e has a vertical and a northward pointing horizontal 
component of magnitude u>' = w e sin(l) and to" = u) e cos(l) respectively, where l 
is a particular place on Earth latitude. The vertical component causes the Fou- 
cault pendulum to precess clockwise at a to " rate when viewed from above. The 
horizontal component can be observed as the rate of the passing stars viewed 
directly overhead. Latitude l is the angular distance from the equator. The pen- 
dulum is expected to precess due to Earth rotation by 

9(t) = u)"t 



(12) 
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That is, the angle depends lineally on time. 

Kalman filtering is a procedure to robustly estimate iteratively state pa- 
rameters from a series of data readings. The filter combines optimally the data 
readings and the current state estimation to produce, based on the uncertainty 
about both, a better estimation of the state. The filter predicts based on the 
accumulated evidence. New readings are used to update the values. Given the 
general framework for state estimation by Kalman filtering, the initial state of 

the bob is xo = 0^ ■ The orientation at time t is given by 9(t) = 0(O) + 0(O)i. 

This is a continuous equation. Since, we are taking measurements at discrete time 
intervals, this relation becomes 

9k + 1 = 9k + 9kdt (13) 

Thus, one can write the system update equation as x^+i = Ex*., where 
Xfc = A) and 

5 Experimental Results 

We tested the mathematical framework exposed in the previous sections. In 
particular, we use a pendulum with flexible steel cable of length 1,16m with a 
0,200kg iron bob. The wire is attached to an iron collar. This simple system 
is subject to severe effects due to friction with the collar. Nevertheless, this 
platform provide us with an experimental setup to prove our algorithms. The 
time required for the pendulum to describe an ellipse is called its period. The 

formula to calculate this quantity is t = where l is the pendulum length 

in meters and g is the gravitational field strength. This quantity at sea level is 
about 9.81 m/sec 2 . In our case, it means that our pendulum will turn around in 
about 2.1606sec. 

The adaptive background construction model introduced in §2 allows us to 
build a good approximation to the scene static objects. This model adapts well 
to changes in illumination conditions and different scenarios. This is important 
because it allows a more ample set of environments where the vision system 
can be deployed. Error, process and measurement covariance errors were given 
initial values 16.0, 0.1 and 49.0 respectively. The pendulum center of mass was 
computed from the resulting foreground image. For the ellipse fitting algorithm, 
we used the last 20 observed positions of the center of mass. Then the Matlab 
routine for the generalized eigenvalue problem was incorporated in our Visual 
C++ implementation. After the first 20 frames, for every new center of mass 
value, an ellipse is computed. For each new value of the center of mass, we discard 
the older one using a First In, First Out (FIFO) policy. For an experimental 
run with 2,000 images, the results are shown in Fig. 3. Fig. 3(a) shows the 
resulting elliptic trajectories estimated. The ellipse’s major axis is selected as 
the predominant direction of motion. Its orientation for this run is presented in 
Fig. 3(b) and 3(c). 
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(a) Drawing of the elliptic trajectory described by the 
bob during the experiment 
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measurement number 



(b) Computed orientation 




(c) Detail in (b) 



Fig. 3. Computing the predominant direction of motion. The sequence has 2000 frames. 
The detail shows about 20 observations 



6 Conclusion 

In this document, we described a complete image analysis system to compute 
the predominant direction of motion of a Foucault pendulum. The system can 
be used in a wide range of scenarios since it builds an adaptive model of the 
background. At the same time, the systems constructs a description of the bob’s 
trajectory from the viewpoint of the pendulum’s support. Finally, the observed 
angular orientation of the ellipse’s major axis is filtered using a Kalman formu- 
lation. As a whole, the system has proved to be a robust, accurate and complete 
solution to the problem. 

Our experimentation with a simple and brittle system suggests that our im- 
plementation will behave just as well or even better with a more robust con- 
struction of Foucault pendulum. 
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Abstract. This work describes a perceptual user interface based on the Mean 
Shift algorithm to control mouse events by gestures using a generic usb camera. To 
demonstrate the usefulness of our work, we present two preliminaries experiments 
that are controlled by the mouth. The first experiment corresponds to a specific 
application in charge of controlling a game called pitfall. The second one is a 
generic experiment that evaluates the precision and robustness of the interface. 
These preliminaries results show that potentiality and applicability of our study 
to disable people. 



1 Introduction 

The body expressions are natural ways used by humans and animals to communicate, 
express feelings and internal intentions in the society. During the last years, they have 
been used to build interfaces more intuitive than the traditional ones based on keyboard 
or mouse. These interfaces, commonly called perceptual user interface( PUI) gave rise 
to a new concept of interaction between man and machine. 

The PUIs allow the user to have a bidirectional interaction with the computer in a 
simplified way. They are good candidates in learning, monitoring and accessibility tasks, 
such as teaching of deaf sign language, studying of athlete performance, helping disable 
people to use the computer, commercial computer games, immersive 3D world, etc. 

Such new generation of interfaces will improve substantially the execution of tasks 
through the parallelization of user actions [1]. For example, the use of vision as a second 
stream of input, in addition to mouse, allows the interface to perceive the user, classify 
his movements and activities by reacting accordingly [2]. In short, they are good hand- 
free alternative and/or extension to conventional pointing devices. Furthermore, they are 
cheaper as compared to early systems that required expensive dedicated hardware like 
headgear or data glove. 

Several interfaces and techniques have been developed during the last years, for in- 
stance, Toyama [3] proposed the use of head motion to position the cursor in a GUI 
through the Incremental Focus of Attention; Fi [4] developed an interface that is able to 
lip-reading using eigensequences; Berard[l] proposed a technique that uses head mo- 
tion to navigate in a document; Davis [5] developed a PUI that uses PupilCam together 
with anthropometric head and face measures to recognize user acknowledgments from 
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head gestures. Nishikawa [6] developed an interface based on vision for controlling the 
position of a laparoscope and so on. 

In this paper, we propose a perceptual interface based on vision that uses as core 
the Mean Shift algorithm [7, 8]. This algorithm is fast and efficient being an excellent 
candidate to real time task. It has been used successfully in task of tracking of objects 
during the last years. In our interface, it is used to track and to interpret gestures. 

This paper is organized as follows. The Section 2 presents the Mean Shift algorithm. 
The Section 3 shows the Perceptual Interface. The Section 4 shows the experiments and, 
finally, the Section 5 presents the future works and conclusions. 



2 Mean Shift 

The Mean Shift Algorithm was proposed by Comaniciu [8, 9] to track non-rigid objects 
with different colors and texture patterns in real-time. The basic idea is to persue a desired 
target in the current frame with a fixed-shape window. The algorithm receives as input 
model a target color distribution and monitors a candidate region whose color content 
matches the input model. It estimates the position of the target in the next image frame 
where supposedly the difference among the color distribution of the candidate region and 
the input model is smaller than a predefined threshold[8]. This difference is calculated 
from the Bhattacharyya coefficient [10] that provides a trusty similarity measure. 

This method has proved to be robust to partial occlusions of the target, large variation 
in the target scale and appearance, rotation in depth and changes in camera position [8]. 
Furthermore, Comaniciu [9] showed that spatially masking the target with an isotropic 
kernel permits the use of a gradient optimization method to perform an efficient target 
localization compared to exhaustive search methods. 

The algorithm works as follows. Initially, it receives as input the color model of the 
target represented as a probability density function (pdf) in the feature space; secondly, 
it estimates the new target position through a mean vector calculated from Mean Shift 
Algorithm. Next, the candidate region move to the direction pointed by the mean vector, 
and the process proceed. 

2.1 The Target Model 

The aim of the method is to follow a given object or feature that moves in a scene. 
The first step, therefore, is to characterize such image. A model of the object is chosen 
in the following way: A circular region, of radius h, centered in the object’s center x c 
and encompassing it totally is selected. For each point (or pixel) x = [x\ , xf) in the 
region a feature vector is extracted and categorized according to a discrete number of 
prototypical features, and the point receives the index of that feature, u - 6(x). The 
feature distribution q = { q u }u=i...m , that accounts for the fractional occurrence of a 
given feature u in the object’s region, is calculated by 

n 

Y «(|xj -x c | /h) S(b(xi),u) 

'/» = i_1 „ . ( 1 ) 

5] K(|Xj - x c | /h) 
i= 1 
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where, x c is the center of the region and 6 is the Kronecker delta function 1 . Observe that 
the distribution satisfy EEi <7u = 1- 

The function k(x) is an isotropic kernel with a convex and monotonic decreasing 
profile. Its goal is to reduce the importance of the peripheral features when calculating 
the target’s q distribution. In our experiments, we choose the Epanechnikov Kernel 



k(x) 



\C~ 1 {d + 2){l^x 2 ) if a; < 1 

0 otherwise. 



( 2 ) 



where Cd is the volume of the unit d-dimensional sphere [11], in our case d = 2, and 

C d = 4/3 7r . 

Specifically, the important features are color. The q distribution represents a color 
histogram q = {q u }u=i...m that incorporates color and spatial information of the image 
pixels. 



2.2 The Candidate Region 

A candidate region is a region whose feature distribution is highly similar to the target 
color model. In a given image frame we define the feature distribution for a given point 
y in the image, and scale h e , as 

n 

E K (l x '» - y|/M s(b(xi),u) 

Pu(y, K) = — — . ( 3 ) 

E «(| - y\/h e ) 

i— 1 

The scale h e defines a tentative size for the object’s circular region. 

The similarity of the two distributions can be calculated by the Bhattacharyya coef- 
ficient 

m 

p( y) = p[p(y),q] = X] vWy) ( 4 ) 

u— 1 

This coefficient can be viewed geometrically as the cosine of the angle between the 
m-dimensional unit vector p = (y 7 /*!, . . . , ■yjp^) and q = (y/ql, ■ ■ ■ , y/q^)- 

2.3 Target Localization 

The target is localized at the point y*, and scale h* , such that p(y*,h*) = ma x Vt h 
p( y, h), i.e., the candidate region with the highest similarity. 

For real time applications an exhaustive search in (y, h) space is hopeless, therefore 
we have to adopt an incremental procedure where the candidate region suffers small 
corrections at each image frame. It is a reasonable approximation when the object’s 
motion is small during the time between frames. 

The adequate method is to adopt the gradient ascent on the similarity function, or 

y n +i = y» + ??Vp(y„) (5) 



1 The Kronecker delta function returns 1 if its arguments are equal and 0, otherwise. 




A Perceptual User Interface Using Mean Shift 



593 



Here we focus to the case where the scale h e = h is fixed at the correct one, and the 
only the position y is adjusted. Considering that y 0 is the current estimate of the object’s 
position, the similarity function can be expanded around this value for small corrections 
using that 

Pui y, h) = Pu( yo, h) + (y- y 0 ) • S7p u \ y = yo , 



as 



1 m 1 m 

P(y) « 5 £ v'p-(yo) 9 « + 2 £>(yy : ^§5 



Introducing (3) in (6), we obtain 



y; tu(xi) K(|y-Xi|//i) 

p{ y) « 2^ y °) + 



E «(|y - x i|//i) 



where the weight w(x.i) is given by 

771 r 

w(Xi) = ^2 S (H x i), u )J 

u=l * 



Qu 



Qb(xi) 



Pu{ yo) \J Pb( Xi )(y o) 



( 6 ) 



(7) 



( 8 ) 



The ratio r u = q u /Pu(yo) indicates if the color u, for the current estimation of the 
object’s position (yo), is above (r u < 1) or below (r u > 1) the model’s prediction. 
Therefore the weight tn(xj) indicates the importance the pixel x, has in correcting the 
object’s color distribution. After some manipulation, the gradient of the eq. (7) can be 
written as 

n 

E ( x * - yo) w( x i) 

= i -^—h (9) 

E u; ( x *) 

i—l 

where it was used that derivative of the Epanechnikov kernel is linear in position. Equa- 
tion (9) is also called the Mean Shift vector. It points to the direction the center of the 
kernel has to move in order to maximize the similarity between the colors distributions 
p(y) and q. 

The new estimate for the object’s position is simply 



E x i w(xj) 

i—l 

yi = — 

E w ( x 'i) 

i= 1 



(10) 



It can viewed as the mean position calculated from each pixel position x* weighted by 
the ratio r u of its color. In the application algorithm if yi overshoots, i.e., if /o[p(yi), q] < 
p[p(y 0 ), q], we do yi — > |(yg + yi) until some improvement is attained. 
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3 Perceptual Interface 

Our interface uses the meanshift algorithm to track and to interpret gestures using a 
generic usb camera. Initially, the user selects the target to be tracked, for instance the 
nose, mouth, hand, etc. Next, he determines actions and associates them to different target 
configurations. We adopted two different controlling strategies. The first is generic: the 
user controls the computer using a virtual mouse, i.e. , the mouse movement and click 
events are set by gestures. The mouse movement is done interpreting the target position 
and the click event is determined by the target configuration. The second strategy is 
position specific, it depends on the absolute position of the target at the image frame 
captured by the webcam. Both will be discussed below. 

3.1 Target Selection 

The selection is done by delimiting a region that contains the target in the image coming 
from the video stream using a box. The Figure 1 shows the selection of a target region to 
be tracked. In this case, the target corresponds to the mouth of the researcher. Using the 
same idea, the user can associated different target configurations to mouse actions. For 
instance, the Figure 1 shows two target configurations. The mouth opened is associated 
to the left mouse click whereas the mouth closed is associated to the track algorithm to 
move the pointer mouse in the environment. 

3.2 Neutral Zone 

Neutral zone corresponds to a specific radial area in the current image frame where no 
mouse movement or click event is performed. It is defined in the beginning of the control 
process; its center cr = (cf , c|) corresponds to the center of the target and the its radius 
r z is set by the user. The neutral zone controls the sensitivity of the system. When the 
center of the target is outside of the neutral zone, the control takes place. The region 5 
in Figure 2 corresponds to the neutral zone. 

3.3 Specific Control Management 

In the specific control, the user maps regions outside of neutral zone into actions. The 
Figure 2 illustrates eight regions commolly used. In our experiments with Microsoft 




Fig. 1. Target selection. The red box delimits the region to be tracked 
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Fig. 2. Image regions used to define different actions 



Powerpoint, we used the regions 4 and 6 to go forward and go back in powerpoint 
presentation, respectively. On the other hand, in our experiments with Windows Media 
Player, we used the regions 4 and 6 to go forward and back, respectively, in the playlist; 
and the regions 2 and 8 to increase and decrease volume, respectively. In this case, it 
was necessary four different regions to control the application. 

An action a n associated to a region n is executed only when a transition from the 
neutral zone to the region n occurs. No action is executed when a transition between 
region outside of neutral zone happens. After the execution of an action, the user must 
return to the neutral zone to activate other action. We intended to implement a mechanism 
to shoot a sequence of actions instead of a single action each time. In place of performing 
m transitions from the neutral zone to the region n to execute m times the action a n , 
the user is going to need to keep the target a period of time in the region n. In this case, 
each action is performed at each r seconds. 

3.4 Generic Control Management 

In the generic control, the user can move the mouse pointer and shoot the left mouse 
click event anywhere. The mouse pointer motion is produced from the displacement of 
the target in the source image. When the center target c* = (c* , c|) is outside of the 
neutral zone. 




the target displacement vector d = (d: . d 2 ) is computed, where d\ = c\ — cf and 
d 2 = cl - cl. 

The mouse position is updated as follows. Initially, we calculate the target displace- 
ment Ad beyond the neutral zone. This displacement has an upper bound u to constrain 
the pointer speed, Ad = min{|d| — r z , u}. The upper bound value u is defined by the 
user in run-time. 

The speed v is (Ad/u)*v max , where v max defines the maximum speed of the pointer 
mouse at each step. The mouse pointer position p = (pi 1 p2) is updated by 

di d 2 \ 

WV\) 



Pt+i = Pt + 



V 



(ID 
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This update is performed at each algorithm step in a similar way to a joystick. The 
Equation 1 1 controls the mouse speed according to v, i. e., the greater the value v the 
faster the pointer mouse will move. It allows a smooth, rapid and precise motion in 
relation to a linear update, where the displacement of the pointer mouse is constant. 

To simulate the left mouse event, we need to determine when the target configuration 
associated to tracking has changed to the target configuration related to the click event. To 
detect such change, initially, we need to extract the color models of both configurations 
using the eq.(l). 

Consider q, and q c the color models of the target configurations associated to track 
and click events, respectively. To determine what action to take, we need to compute 
the similarity between the candidate region, discussed in Section 2.2, and these color 
models. This computation aims to identify what model best matches with candidate 
region, and it is done only when the target center is inside the Neutral Zone. 

Misclassification can occur due to luminosity differences. To avoid it, we execute 
an action only when the target configuration associated is identified more than t times 
successively, otherwise, the configuration is interpreted as a noise. 

4 Experiments 

This section presents two experiments using our interface. The first one shows results 
of the specific control management. In this case, we present some snapshots of the 
interface controlling a game called pitfall. The second experiment is related to generic 
control management. It aims to provide a quantitative measure of the interface related to 
learning and motion precision. All experiments use rgb color space. We observed that rgb 
outperforms hsi, because rgb automatically incorporates spatial information as deepness 
generally disregarded by hsi color model. This feature allows the track algorithm to 
follow specific parts of an object with the same color. 

4.1 Specific Control Management Experiment 

This experiment illustrates the usefulness of the interface in the control of a commercial 
game using the mouth. This game consist of guiding a treasure hunter in a jungle. 
The hunter runs and jumps through a 2D sideview environment, avoiding hazards like 
crocodile-filled waters, sinkholes, etc. The Figure 3 shows the researcher controlling the 
hunter. 

In Figure 3, the blue circle corresponds to the neutral zone and the red box delimits 
the target. Observe the mouth position, in relation to the neutral zone, determines the 
action that the hunter should execute. When the mouth is in the regions 4 or 6, the hunter 
moves to the left or to the right, respectively (see Section 2). When it is in the regions 2 
or 8, the hunter jumps or stoops. 

4.2 Generic Control Management Experiment 

This experiment aims to show the precision and easiness of our interface. We use a 
maze-like environment, illustrated in Figure 4. The user was in charge of leading the red 
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Fig. 3. Experiment with the Pitfall game 




block (traveller) from the start position to the final position while avoiding walls. We 
performed 6 trials for which one of 3 users. After each trial, we measured the time spent 
and the length of the path followed by the traveller. 

The walls along the corridor do not block the traveller during its motion. This feature 
increases the complexity of the task, because it does not constrain the traveller motion. 
The Figure 5a) shows the evolution of learning for each user during the trials. The axis y 
corresponds to the path length, in pixels, followed by the traveller and the axis x indicates 
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(a) 



(b) 




Fig. 5. Experiment results, a) distance as a function of the number of learning trials, b) time spent 
to perform the task as a function of the number of learning trials 



the respective trial. The solid line is the average length of the path produced over trials. 
The other lines are associated to each user. Observe the path length decreases as the user 
learns to use the interface. After, only 6 trials, the path length produced by each user is 
near of the optimal path estimated previously in 1120 pixels. 

The average length of the path followed by the traveller after the 6 trials is £ = 
1160.9 with standard deviation equals to ag = 169.74. Figure 5b) illustrates the time 
spent to perform the task. The users spent an average of t = 15.23s to achieve the goal 
with standard deviation of at = 4.99 s. 



5 Conclusion 

This paper describes an approach based on the Mean Shift Algorithm to control the com- 
puter by gestures using a generic webcam. We validated the usefulness of our approach 
in two different tasks. The first task is specific and consists of controling a game called 
pitfall. The user controls the treasure hunter in a jungle using the motion of the mouth. 
The motion of the hunter is limited to 4 specific actions (jump, stoop, move left, move 
right). We observed that the users quickly learned to control the treasure hunter. This 
observation was comproved in the second task. 

The second task is generic and aims to show the precision and easiness of the interface. 
In this case, the user was in charge of leading a traveller from a start position to a final 
position while avoiding walls. The users conducted the traveller to a path close to the 
optimal path in a short period of time, nearly 15 s. 

These results are very promising. However, we still have some challenges to over- 
come. We only use color information to guide the track algorithm. This information 
is very sensitive to luminosity differences, which can easily generate misclassification. 
We intend to incorporate texture or spatial information to the color distribution. Fur- 
thermore, the track algorithm does not handle faster targets conveniently. It fails if the 
displacement of the target between two successive frames is bigger than the radius of 
the search region. 
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However, it is very important to stress that neither mark nor specific dress was needed 
in order to make the interface work. This approach is a viable and cheap candidate to 
help disabled people in daily tasks. 
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Abstract. There have been increasing interest in 3D surface contouring 
technologies over the last decade. 3D surface contouring techniques have 
numerous applications in design and manufacturing, 3D contouring tech- 
nology is also used in reverse engineering where construction of a CAD 
model from a physical part is required. We propose a system based in the 
projected fringe technique. We present here early results of a potentially 
fast, efficient and reliable method to extract the surface contour of 3D 
external objects. A View Sonic PJ551 digital projector is used as active 
light source, and a DFK 50H13 CCD camera with a DFG/LC1 grabber 
are used to grab images. One advantage of projected fringe technique is 
that allows acquiring dense 3D data in few seconds. 



1 Introduction 

Conventionally, coordinate measurement machines (CMM’s) have been used for 
measure 3D surface coordinates. CMM’s are well-known and widely accepted 
in industry, but the major drawbacks of these methods is that they first, re- 
quires point-by-point scanning, which is time consuming and difficult to reach 
the requirement of on-line measurement; moreover, requires mechanical contact 
during measurement, which could produce wear of the probe itself and damage 
of the measured surface. This hinders their utilization to delicate surfaces, such 
as optical surfaces, thin films, and silicon wafers. 

Optical 3D sensors measure the shape of objects without the need to physi- 
cally probe surfaces. They are faster, cheaper and provide a higher measurement 
density than traditional techniques. Various different optical shape acquisition 
methods have been developed for a wide range of applications, including inspec- 
tion, robotic vision, machine vision, medical diagnostics, large infrastructure 
inspection (buildings, bridges, roads, and tunnels) and corrosion inspection. 

One of these methods is projected fringe technique. Particularly the projected 
fringe technique provides dense and reliable surface acquisition by projecting 
fringe patterns and recover phase information by phase shifting interferometry 
(PSI). 
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2 State of the Art 

Projected fringe technique is an extension of triangulation [12] for out-of-plane 
and topography measurement. The fringe patterns are projected on the object 
surface and these are distorted in accordance with the object height. Unlike the 
projection moire techniques, the distorted fringe patterns are directly captured 
by a CCD camera and the surface height can be reconstructed from the deformed 
fringes, instead of using a reference grating to create fringes [11, 14]. The main 
advantage of this technique is that it is very easy to setup and does not require 
intensive calculation. 



2.1 Bases of the Projected Fringe Technique 

Digital fringe projection is based in traditional PSI techniques, but fringe pro- 
jection has advantages in the phase-shifting accuracy, system simplicity as well 
as measurement speed. PSI is not a specific optical hardware configuration but 
rather a data collection and analysis method that can be applied to a great 
variety of testing situation. 

An active light source projects fringe patterns on the test object, the varying 
depths of the objects surface cause phase variations on the pattern projected. 
These phase changes are used to find out the surface coordinates of the objects 
to be measurement using PSI and phase unwrapping techniques, [15,9, 16]. 

The basic equation for two-beam interference is 

I{x,y) = I a + hcos(0(x,y)) . (1) 



or 

I(x,y) = IavgO- + 'ycos(9(x,y))) . (2) 

Let 0(x, y) be equal to a constant value, <5, and a variable term, y) that 
depends upon position x, y. The intensity can be written as: 



I(x,y,6) = IavgO- + 7 cos(<j>(x,y) + 6)) . (3) 

While the equation 3 is the most common way of writing the intensity dis- 
tribution for two-beam interference, it is often convenient to rewrite cos (</> + 6) 
as a product of cos (</>) cos(<5j and sin(^) sin(<5). That is: 

I(x,y,6) = Iavg + I aV gl cos(<j)(x, y)) cos(6) - l avg ysin(0(x, y)) sin(<5) . (4) 

Letting a 0 = I avg , «i = Iavg'J cos(4>(x, y)), and a 2 = -I avg "fsm((l)(x,y)) we 
can write: 

I = ao + a\ cos (8) + a 2 sin(<5) . (5) 

It is important to note that 



and 



tan (<j)) = 



-a 2 

Ol 



7 = 



V a i + 1 

a 0 



Equations 5 and 6 are the two most useful in PSI. 



( 6 ) 



( 7 ) 
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Many algorithms have been developed for PSI, such as 3-step algorithm, 
least-squares algorithms, Carre algorithm and Hariharan algorithm. The major 
difference between these algorithms is how the reference phase is varied and the 
numbers of times and the rate at which the interference pattern is measured. 
More detailed descriptions about these algorithms are given in [13]. 

2.2 Phase Unwrapping Algorithms 

In most fringe analysis methods, the arctangent function is used to obtain the 
phase of the fringe pattern. This function returns values that are defined between 
the limits i r and — tt. Hence, the result is given modulo 27 t and discontinuities, of 
value near to 27r, appear in the resulting phase distribution [6] . The 27r disconti- 
nuities should be removed, and the process of removing these discontinuities is 
called unwrapping [8]. Many different algorithms exist, but a correct solution is 
not guaranteed, and very long execution times are often involved [5]. 

Phase unwrapping process is simple, however, things can be very complicated 
because of all kinds of error sources, especially when an automated phase un- 
wrapping process is required. The error sources that arise most frequently in a 
fringe pattern are as follows: 

1. Background or electronic noise produced during data acquisition. 

2. Low data modulation points due to low surface reflectivity. 

3. Abrupt phase changes due to surface discontinuities or shadows. 

4. Violation of the sampling theorem. 

Most phase unwrapping algorithms can handle (1) and (2). (4) can be avoided 
by changing system setup. For error source (3), one needs to have a priori knowl- 
edge of the object or to use special techniques. Otherwise (3) will result in path- 
dependent phase unwrapping which is unacceptable. 

A major goal of fringe analysis is to automate the phase unwrapping process. 
Automatic techniques are essential if systems are to be run unsupervised or high 
speed processing is in demand. The following sections briefly review some of the 
representative phase unwrapping techniques. 

1. Phase Fringe Scanning Method. Greivenkamp proposed this method in [7]. 
A horizontal line of the phase image is unwrapped first. Then starting at 
each point on this line, the whole phase map is unwrapped vertically. This is 
the most straightforward method of phase unwrapping and therefore is the 
fastest method among all phase unwrapping techniques. Unfortunately, this 
method can not handle phase images of objects with areas of low surface 
reflectivity. 

2. Phase Unwrapping by Sections. Arevallilo and Burton developed this algo- 
rithm [1,3]. An image can be subdivided into four sections. If a section is 
larger than 2x2 pixels, this section is further subdivided into four sections. 
It is easy to unwrap a 2 x 2 area and two areas can be connected by checking 
their common edge. After subareas have been unwrapped, they are joined 
together. Points on the edge are traced to judge if a shift should be made, up 
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Fig. 1. Layout of the measurement system 



by 27 r, down by 27 t, or no shift according to certain weighting criterion. This 
method tends to provide global unwrapping optima but has the complex- 
ity to deal with error source areas. The weighting criterion for connecting 
sections is also hard to be used as a general criterion. 

3. Phase Unwrapping by Using Gradient. This method was proposed by Huntley 
[10] in an attempt to solve the problem of surface discontinuity. Since a large 
phase difference between any two adjacent pixels increments the possibility 
of a discontinuous phase change, this algorithm always tries to choose the 
direction of the smallest gradient to unwrap phase. The phase at each pixel 
is compared with its 8 neighboring pixels. The direction in which the phase 
difference is the smallest is taken to unwrap the phase. 

The drawback of this algorithm is that invalid pixels are phase unwrapped 
eventually, which introduce phase unwrapping errors. One solution to this 
problem is to set a threshold phase difference to reject invalid pixels due 
to noise or phase discontinuity. The threshold has to be flexible in order to 
adapt to different circumstances, which makes automatic phase unwrapping 
difficult. 

Using the first phase difference may result in misjudgment in choosing the 
right unwrapping direction. In [2, 4] is proposed the second order difference 
method which is used to improve the performance. Phase unwrapping based 
on least gradient does provide the most logic way for phase unwrapping. 
However the method may not be able to unwrap all the pixels of interest 
automatically and to handle zones of high curvature of phase map. 

3 The Shape Measurement Method 

Our method works in the following manner: 1. a digital projector generates the 
active light source; 2. phase changes due to variant surface depths will cause 
phase changes in the projected active light on the test object; 3. a CCD camera 
and an image grabber card are used to capture images. A sketch of the optical 
system is showed in the figure 1. 





